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Beware the rise of the radical right 


Academic freedom is on the hit list when politicians of the extreme right gain office — as they have 


done in some European countries. 


station is a tiny organization that has helped tens of thou- 

sands of academics find sanctuary from conflict. Co-founded 
85 years ago by the economist William Beveridge and physicist Ernest 
Rutherford, the organization, now called the Council for At-Risk Aca- 
demics (CARA), enabled many notable twentieth-century scientists 
— including biochemist Hans Krebs and philosopher Karl Popper 
— escape the Nazis and settle at British universities. In recent years it 
has reached out to the Middle East and receives the largest volume of 
applications from Yemen and Iraq. 

CARA and its counterparts in other countries exist because 
governments in the host nations value three of the pillars on which 
democracy rests: the rule of law, a free press and, as we explore ina 
Comment article on page 621, freedom of academic enquiry. If the 
British government were to decide not to support even one of these, 
CARA would struggle to carry on. 

Such an alarming scenario is not purely hypothetical. For at least 
the past two decades, citizens of countries in the European Union have 
increasingly been voting for parties of the extreme right (also known 
as the populist right or radical right). From almost no representation 
in the 1990s, these parties are in governing coalitions in 10 out of 
the EU’s 28 member states, including in Austria, Hungary, Italy and 
Poland. Next May sees elections to the European Parliament in which 
right-wing parties are expected to increase their combined tally of 
78 seats in the 751 seat chamber. 

When parties of either the extreme right or extreme left take power, 
any one of democracy’s foundational pillars can be knocked away. 

Journalists and their families are intimidated. Judges are demonized 
and replaced with allies. People from minority groups are singled out 
for their alleged disloyalty. And action is taken against academics: uni- 
versities are brought under direct state control and staff are subjected 
to loyalty tests. 

It’s a classic playbook to quash dissent. Take Poland for example, 
where the state has moved to exert control over the media and judici- 
ary. Academic freedom is under threat too. A barometer for the risk it 
could face will be how much protest the Polish government allows, if 
any, over its pro-coal stance — which climate scientists have warned 
against — during the annual United Nations climate talks to be held 
in Poland next month. 

Although there has been much media attention on the phenomenon 
of the populist right, the implications for academic freedom have gone 
largely unreported. Even where there has been widespread coverage — 
such as the case of Hungary’s Central European University which was 
forced to enrol new students in Vienna rather than Budapest — EU 
institutions such as the European Council and the European Parliament 
have been largely powerless to take action. 

Europe's heads of government are biting their lips, and their reasons 
for doing so are understandable, even if European agreements or 
conventions are being violated. There is, of course, the principle of 


He inside a 1970s office block close to London’s Waterloo 


non-interference in the affairs of a sovereign state. But, in addition, the 
EU works through the collective solidarity of its member states. This 
is what has enabled the organization to enact progressive policies in 
climate change, anti-discrimination legislation and employee rights. 
But collective progressivism breaks down when one-third of EU 
governments include political parties with scant commitment to pro- 
tecting democratic institutions. Now that EU governments include 
parties who do not believe in the rights of 


“The people from minority groups, the consen- 
implications sus on climate change, or, indeed, academic 
for academic freedom, it will become more difficult for the 
freedom have EUasa whole to either advance, advocate or 
gone largely protect policies in these fields. 


“What’s wrong with the world is not 
nationalism itself? noted Michael Ignatieff, 
the embattled rector of the Central European University. What's 
wrong, he added, “is the kind of nation, the kind of home that 
nationalists want to create and the means they use to seek their 
ends.” 

Ignatieff wrote these words more than 20 years ago in Blood and 
Belonging (BBC Books, 1993), at the end of a series of journeys into 
some of Europe's conflict zones. But he remains optimistic about the 
continent’ future. “I don't want to predict doom and gloom, he told 
Nature. “Regimes come and go, but universities remain” 

Academics everywhere will hope he's right. They, and us, can help 
by speaking out against injustice and specific cases where academic 
freedom is threatened — by any regime. = 


unreported.” 


Breeze block 


Wind farms must be built responsibly so they 
don’t create an inefficient wake for neighbours. 


tical connection. It describes a manoeuvre in which one sail- 
ing ship steers directly downwind towards another, effectively 
snatching away the overborne vessel's wind to leave it powerless. 
Wind turbines can overbear each other, too. As developers seek to 
build ever more of them — globally, installed onshore wind capacity 
rose to almost 500 gigawatts last year, up from just 92 GW in 2007 — 
some of the best blustery locations are getting crowded. That could 
bea problem. To work best, wind turbines need to capture a clear and 
uninterrupted stream of moving air. Anything that gets in the way — 
from mountains and buildings to a rival wind farm — reduces wind 
speed and the electricity generated. Such obstacles also break up the 


L= many words in the English language, ‘overbearing’ has a nau- 
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air flow and the resulting turbulence increases noise, as well as wear 
and tear on the turbine blades. 

A study published in Nature Energy this week shows just how over- 
bearing this effect can be (J. K. Lundquist et al. Nature Energ. https:// 
doi.org/10.1038/s41560-018-0281-2; 2018). It analysed the change in 
electricity production at a wind farm in West Texas when another farm 
was built a few hundred metres upwind and switched on 18 months 
after the first farm opened. The researchers estimate that the down- 
wind farm may have lost 5% of its potential on average, and as much 
as US$2 million annually in electricity production. Texas is unusual: 
it has the largest number of wind turbines in the United States, with 
more than 12,000 devices spread across 131 separate farms. Inevitably, 
the separate projects are clustering at the best sites, which have reli- 
able wind and access to transmission lines. In the study, some of the 
turbines in the upwind farm stand just 300 metres from some of the 
downwind turbines. 

But the study authors say the impact could stretch much further. 
Under the right atmospheric conditions, the decreases in downwind 
wind speeds can extend for 50 kilometres or more. Almost 90% of US 
wind farms have a neighbouring project within 40 km, and so could 
be affected. (Of course, not all of them would be affected all the time, 
because the wind changes direction. The Texas study looked only at 
the impact under the prevailing southwesterlies.) There is also inevi- 
tably internal disruption within a single wind project, with the upwind 
turbines creating a wake that reduces the output of those behind. 

One solution to wind farms treading on each other's toes is to leave 
the land behind and head to the vast spaces of the oceans. But off- 
shore wind farms — typically much more expensive to build and run 
— also tend to compete for the best sites. In 2014, the Danish firm 
DONG Energy Wind Power (now @rsted, based in Skeerbeek) pub- 
lished data to show how the performance of its long-standing project 
at Nysted, close to the island of Lolland in the Baltic Sea, was being 
undermined by a another company’s wind farm constructed just 3 km 


away (N. G. Nygaard J. Phys. Conf. Ser. 524, 012162; 2014). 

What can be done? Technical fixes to the design or layout of pro- 
jects are difficult, especially as wind turbines grow larger and more 
powerful. Some engineers have proposed offshore turbines that float 
and can shift position to reduce wake as the wind moves, but that’s 
clearly impossible on land. Could rules and restrictions work? A legal 
analysis by the study authors found no relevant legislation in place 

in the United States. As a comparison, solar- 


“It’s crucial power efficiency in California is protected by 
in a warming regulations to limit the amount of shadow 
worldtosupport from neighbouring properties that can fall 
efforts to boost on panels during peak operating hours. 


Where they exist, restrictions on the con- 
struction of wind turbines often focus on 
more immediate risks. In a 2008 dispute between rival developers who 
wanted to build wind farms on adjoining properties in North Dakota, 
officials ruled only that each turbine must be placed further than its 
own height from the boundary, so that if it fell it would not land on the 
other side. Wind shadow wasnt considered. 

It’s crucial in a warming world to support efforts to boost wind 
power, and therefore important to install wind farms responsibly to 
ensure that we harness as much energy as possible, even if the facili- 
ties are close together. That means it’s important to craft regulations 
to support such development. 

One country has long taken an enlightened view, and could offer a 
model to follow. The Netherlands is famous for its windmills, many of 
which still function, thanks to a law that guarantees each mill can con- 
tinue to fill its sails with the necessary wind (called its molenbiotoop, or 
windmill biotope) by restricting development within 375 m. The law 
has led to some creative solutions: in 2010, a flour mill in Spijkenisse 
from the 1860s was cut from the ground, raised and placed ona 
7 m-high concrete collar to allow houses to be built nearby. Where 
there’s a mill, there’s a way. m 


wind power.” 


Ban bullying 


Allinstitutions need a procedure for dealing 
with bullies. 


laboratory with supportive colleagues. So the added pressure 

of a boss or co-worker who regularly abuses, trivializes, hassles, 
belittles and unfairly criticizes is not just a problem for the individual 
concerned. It’s bad for research. 

Such workplace bullying thrives on silence. But, as occurred with 
sexual harassment, there is growing noise about bullying in science. 
Already this year, allegations of bullying have rocked the world of 
astrophysics, closely followed by those of cancer genetics, neuroscience 
and vertebrate palaeontology. 

Much of this additional scrutiny is down to the willingness of 
scientists to speak out. Now is the time for more institutions to follow 
their lead and step up to take decisive action. Does your institution 
have an anti-bullying policy? If you work in Britain, the answer is 
probably yes; but if you work in countries such as the United States, 
the answer might be no. As a News Feature on bullying in science 
highlights this week (page 616), few US institutions have policies that 
explicitly prohibit their staff from bullying others. Such behaviour 
might be covered by anti-harassment policies, but in those cases, 
targeted staff members can seek redress from their employer only if 
they fall into a group protected by employment law and can show 
that they have been targeted because of their sex, race, religion or age. 
The motivation of a bully should not be the issue here. Bullying is 
unacceptable, and more employers must make that clear. 


\ cience can be difficult enough even if you work in a great 
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What to do? If you feel that you are being singled out for unfair 
treatment by your boss or colleague, you have several options, and one 
of them is to talk to others. You will need support from your friends 
and family, and no one can help you if they don't know it’s happening. 
By sharing your story with trusted peers, you might discover that other 
people you work with are going through the same thing. 

Seek advice about what you can do to address the problem. Speak 
to someone in your institution’s human-resources department or a 
manager about how to solve the problem informally. If you belong to 
a union, you can ask it for advice. It can be helpful to keep a diary of 
the problematic behaviour. If you feel confident enough and it is safe 
to do so, think about speaking to the bully. Calmly try to tell them that 
you find their behaviour unacceptable and ask them to stop. 

Many who have been through the process can testify to the profes- 
sional upheaval and emotional turmoil that comes with reporting a 
bully. It is easy for those who are not in sitting in the eye of the storm 
to extol the virtues of flagging up bullying for the greater good of 
science and society. There are no easy answers, and some cases might 
boil down to one person’s word against another's. 

This is why institutions need to step up to the mark. Reports of 
bullying should be fairly and thoroughly investigated, with attention to 
due process. Anti-bullying policies or codes of conduct for staff should 
be easily accessible, give clear guidance on what behaviours are and 
are not appropriate in the workplace, and outline the measures that 
would be taken if allegations are reported. 

Crucially, institutions need to follow these policies to the letter, 
regardless of whether the alleged perpetrator is the director of the 
institute or a first-year PhD student, to protect all those involved — 
including the accused, who might be the victim of malicious allegations. 
Incomplete or unfair investigations can undermine the credibility of an 
organization, harm careers and signal to bullies that their behaviour will 
be tolerated — in 2018 that is unacceptable. m 
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WORLD VIEW  pennisicos son 


the human part, then that’s a problem.” 

This is what I heard when I interviewed 52 scientists recog- 
nized as exemplary by their peers for their scientific accomplishments 
and conduct. Related themes come up in my work with scientists who 
have been referred to a formal remediation programme after lapses 
in research integrity. 

I'm an organizational psychologist, specializing in the scientific 
workplace. What interests me are the decisions and behaviours that 
yield innovative, rigorous, ethical research. 

The past few months have drawn attention to unhealthy working 
environments, especially bullying in academia. We should also focus 
ona related, widespread problem: mentors who have excellent inten- 
tions but limited knowledge of how to create a healthy workplace. 

Many scientists whom I work with feel that 
they lack management and leadership skills. 
They want help with concrete tasks such as 
coordinating projects or facilitating meetings. 
But what comes up most emphatically is that 
conducting research requires them to establish 
and maintain positive relationships in the lab. 

Many researchers in our remediation 
programme have had strained interactions with 
compliance officers and have struggled in their 
roles as supervisors. By contrast, exemplars 
resoundingly emphasize how they foster good 
team dynamics by being involved, approachable 
and aware of the workplace atmosphere. As one 
told me: “Rule number one in the lab is harmony. 
First and foremost, we have to get along, we have to respect each other, 
we have to trust each other, and that is the operating principle for 
everything else.” 

Yet, given the choice between working on a scientific paper or 
broaching a difficult conversation, many researchers pick the former 
— the task that feels most directly connected to research goals. Prin- 
cipal investigators might need to work consciously against the feeling 
that ‘nothing is getting done’ during personal interactions. Because, 
whether it is mentoring a struggling trainee or celebrating a hard- 
won achievement, investing in strong, respectful relationships is an 
investment in effective science. 

So, what to do? All principal investigators should add relationship 
building to their to-do lists. 

Task one: put recurring one-on-one meetings with the members of 
your group on your calendar. Set up a notebook or spreadsheet and jot 
down anything you should bring up during these meetings. Set an alert 
for ten minutes before the appointment to decide how to approach 
the meeting. Does the team member need encouragement? Career 
guidance? Feedback on their project and direction for next steps? Are 
they behind on deadlines or lacking confidence? Try a respectful, yet 
firm, nudge. Have you expressed gratitude for their contribution? As 


cc [= a human first, and then I learned to bea scientist. IfI forget 


SHOWING THAT 


YOU CARE 
IS MORE IMPORTANT 
THAN SHOWING THAT 

YOU ARE 


PERFECT. 


Be human first, 
- ascientist second 


Want to get the best research from your team? Take these six steps to invest in 
stronger relationships, urges Alison Antes. 


one exemplar noted: “I value what they do, and I tell them.” 

Ask yourself whether it is time for a difficult conversation. If so, 
grasp the nettle. That is part of a leader’s job. Sometimes principal 
investigators worry that they will damage relationships by having chal- 
lenging discussions. In the long-run the opposite is true. Use your 
ten minutes to list a few observations. State the specific behaviour of 
concern; describe how it affected you, the team or the project. Then, 
ask the person for their perspective. If there is discord in the lab, speak 
to the individuals involved, state your expectation of mutual respect, 
ask them to discuss and identify a solution. 

‘Task two: invite people to share both complaints and highlights. 
Several exemplary scientists explicitly require their trainees to relate a 
concern or struggle at some point in one-on-one meetings. They want 
to help people to be comfortable enough to bring problems and mis- 
takes to light, and so address issues early, while 
they are manageable. Several exemplars noted 
that researchers need outlets for discussing frus- 
trations and anxieties. They know it is difficult 
to show up and do your best when plagued by 
worry. And they want to know what is working 
well in the lab, so as to leverage these successes. 

Task three: walk the ‘shop floor’. Even when 
team members are welcome to visit your office, 
visibility supports approachability, impromptu 
brainstorming and immediate trouble-shooting. 

Task four: model desired behaviour in team 
meetings. How you communicate will carry 
over into peer-to-peer interaction in your 
group. Ask questions, expect participation and 
prompt people to share their thoughts. Find out where obstacles are. 
Encourage cooperation and mutual support. Explicitly state that you 
value a collaborative spirit in your group. 

Task five: schedule a few social occasions for people to spend time 
together in a more relaxed way. Such activities might feel far removed 
from science, but they can ease tensions in the lab. Start small. Be sure 
to accommodate the needs of parents and carers, people with cultural 
or religious considerations and those on tight budgets. 

Task six: advocate outside the lab. Talk about these practices in your 
department, share those that work and learn from people known to 
be great team leaders. 

New principal investigators commonly adopt the practices of their 
own mentors without reflection, and often their role models were not 
ideal. Some relationship-building tasks will feel awkward at first; that’s 
okay. Showing that you care is more important than showing that you 
are perfect. m 


Alison Antes is assistant director of the Center for Clinical and 
Research Ethics at Washington University School of Medicine in 
St. Louis, Missouri. 

e-mail: aantes@wustl.edu 
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Scotland’s research 


The Scottish government and 
seven bodies representing 

the nation’s research and 
higher-education sectors 

— including Universities 
Scotland and the Royal Society 
of Edinburgh — have agreed 
to work together to protect 
Scottish research from Brexit. 
The alliance released a joint 
statement on 22 November, 
coinciding with a summit at 
the University of Glasgow 
that attracted research leaders 
from across Scotland. The 
group aims to use its influence 
to press the UK government 
for clarifications and firmer 
guarantees on research, as 
well as to give Scotland a visa 
system that allows overseas 
students and postgraduates 

to stay on and work after their 
studies. Around 27% of its 
full-time researchers and 10% 
ofits university students are 
nationals of other European 
Union countries. Once Britain 
leaves the EU, it is likely to be 
harder for them to come to the 
United Kingdom to work. 


Brexit framework 


European Union leaders 
meeting in Brussels on 

25 November approved 

a declaration on the EU’s 
future relationship with the 
United Kingdom. Billed asa 
framework that will form the 
basis of a trade deal beyond the 
end of a transition period in 
December 2020, the 26-page 
document confirms Britain’s 
intention to end free movement 
across its borders. It includes 

a pledge by both parties to 
consider arrangements “for 
entry and stay for purposes 
such as research, study, training 
and youth exchanges”. The 
document reiterates the United 
Kingdoms intention to pay to 
participate in EU programmes 
in areas including science and 
innovation. And it says that 

the country aims to remain a 


Dire US climate-change forecast 


Climate change is already affecting life in 

the United States, and its impacts are set to 
become more dramatic in the coming decades, 
according to the US government's fourth 
national climate assessment. The analysis, 
produced by 13 federal agencies and required by 
law every 4 years, was released on 23 November. 
Among other things, the report finds that 
higher temperatures and drier conditions have 
led to more large fires in the western United 


part of the European Research 
Infrastructure Consortium 

of research networks, two of 
which it currently hosts, and of 
the European Defence Fund, 
a scheme established last year 
whose research budget could 
rise to around €500 million 
(US$570 million) a year from 
2021. At the same meeting, 
the EU leaders endorsed 

a withdrawal agreement, 
published on 14 November, 
which lays out the terms of 
Britain’s exit from the EU. 


China’s mega ‘LHC’ 
Beijing’s Institute of High 
Energy Physics (IHEP) 

is designing the world’s 
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biggest particle smasher. 

The 100-kilometre- 
circumference facility would 
dwarf the 27-kilometre Large 
Hadron Collider (LHC) at 
CERN, Europe's particle- 
physics laboratory near 
Geneva, Switzerland. The 
ambitious 30-billion-yuan 
(US$4.3-billion) facility, 
knownas the Circular 
Electron-Positron Collider, 

is the brainchild of IHEP’s 
director, Wang Yifang. The 
collider will produce Higgs 
bosons by smashing together 
electrons and their antimatter 
counterparts, positrons. 
Because these are fundamental 
particles, their collisions are 
cleaner and easier to decipher 
than the LHC’s proton-proton 


States (pictured), and the combination of rising 
seas and extreme precipitation has boosted 
flood risks along the east coast. The core 
message of the document contradicts positions 
taken by US President Donald Trump's 
administration, which released the report on 
Black Friday, the day after the US Thanksgiving 
holiday. Some scientists and environmentalists 
have suggested that this timing was part of an 
attempt to downplay the report's findings. 


collisions, so once the Chinese 
facility opens, in about 2030, 
it will allow physicists to study 
the mysterious particle and 

its decay in exquisite detail. 
Initial funding has come from 
the Chinese government, but 
the design is the work of an 
international collaboration 

of physicists that hopes to 
garner international funding. 
The blueprints published on 
14 November reveal that the 
collider would run in a circle 
100 metres underground, at a 
location yet to be decided, and 
would host two detectors. 


Science inacloud 
A preliminary version of 

the European Open Science 
Cloud online portal launched 


JOSH EDELSON/AFP/GETTY 


GERARD JULIEN/AFP/GETTY 


SOURCE: J.D. WALLACH ET AL. PLOS BIOL. HTTP://DOILORG/CXD6 (2018). 


on 23 November. Scheduled 
to become available in 

full in 2020, the portal is 
intended to make it easier 

for European researchers 

to store, analyse, share and 
reuse data. The launch of 
www.eosc-portal.eu — which 
will eventually provide a 
single entry point to existing 
data repositories, as well as 
cloud-computing facilities and 
analysis tools — comes after 
two years of consultation and 
development. At the launch 
in Vienna, the European 
Commission also announced 
the make-up of the initiative’s 
executive board, which will 
comprise representatives of 
university associations, data 
infrastructures and research 
institutes, as well as three 
independent experts. The 
commission plans to allocate 
€600 million (US$680 million) 
to the initiative by 2020. 


African artefacts 


Tens of thousands of African 
artefacts in French museums 
should be returned, concludes 
a 23 November report 
commissioned by the country’s 
president, Emmanuel Macron. 
The report — by economist 
Felwine Sarr at Gaston 

Berger University in Saint- 
Louis, Senegal, and historian 
Bénédicte Savoy of the College 
de France in Paris — calls on 
France to amend its laws to 


TREND WATCH 


Biomedical research is becoming 


more open and transparent by 
providing increasing amounts 
of information about funding, 
conflicts of interest and data 
sharing in its publications, 
according to a survey of recent 
papers. John Ioannidis at 


Stanford University in California 


and colleagues examined 

149 papers published between 
2015 and 2017 to see how 
many included information 
on indicators of transparency, 
such as who funded the work, 


potential conflicts of interest, and 


the availability of the raw data 


and complete research protocols. 
They found that the majority of 


allow for the repatriation of 
cultural artefacts acquired 
during the French colonial 
period in Africa (pictured), if 
African countries request their 
return. This includes artefacts 
from the late nineteenth 
century until 1960 and those 
later acquired illicitly. The 
Quai Branly Museum in Paris 
holds some 70,000 objects 
from sub-Saharan Africa. 


POLICY 


Salk settlement 


The prestigious Salk Institute 
for Biological Studies in 

La Jolla, California, has settled 
the final one of three high- 
profile gender-discrimination 
lawsuits filed last year. The 
agreement was announced 

on 21 November. Molecular 
biologist Beverly Emerson filed 


papers contained statements on 
funding and conflicts of interest 
(69% and 65%, respectively), and 
almost one in five mentioned 


publicly available data — 


although only one paper included 


alink toa full study protocol 
(J. D. Wallach et al. PLoS Biol. 
http://doi.org/cxd6; 2018). 

The survey results were a 
big improvement on those ofa 
previous study by some of the 
same researchers. This found 
that, in a sample of 441 articles 
published between 2000 and 
2014, most contained almost 
no information on funding, 
conflicts of interest or data 
sharing. 


the suit in July 2017, arguing 
that discrimination against 
women at the Salk had limited 
her wages, laboratory space 
and research funding. Two 
other senior female scientists 
brought similar suits against 
the institute, and settled their 
cases out of court in August 
2018. Emerson had worked at 
the Salk for more than three 
decades, but in December last 
year, the institute declined to 
renew her contract. She is now 
at Oregon Health and Science 
University in Portland. “Salk 
recognizes Dr. Emerson's more 
than thirty years of service to 
the Institute and looks forward 
to her continued contributions 
to the scientific community,” 
says ajoint statement from 
Emerson and the institute, 
e-mailed to Nature. The 
statement does not provide 
any further information on the 
settlement. Alreen Haeggquist, 
Emerson's lawyer, says that 
neither she nor Emerson has 
further comment. 


Plan S detailed 


A group of 16 science funders 
have detailed their ambitious 
plan to ensure that, by 2020, 
the results of the research they 
support is immediately free 

to read. Since the September 
launch of the initiative, 
known as Plan S, scientists 
have speculated as to how it 
could affect their research. 


OPENING UP 


SEVEN DAYS | THIS WEEK | 


Many publishers have also 
expressed serious concerns 
about the proposal and have 
questioned its rationale for 
excluding ‘hybrid journals’ — 
journals that allow researchers 
to make their work free to 
read if they pay a fee, but that 
keep other studies behind a 
paywall. Now, documents 
released on 26 November 
clarify that researchers will be 
allowed to publish in hybrid 
journals if they can post 

the accepted manuscript or 
final article in an approved 
open-access repository at the 
time of publication — but 

in these cases, the funder 

will not pay for publishing. 
The plan’s documents also 

list three ways in which 
researchers can publish work 
that is compliant with the plan: 
publish in an open-access 
journal or platform approved 
by the funders; immediately 
put a copy of the manuscript 
accepted by the journal, or 

the final published article, 

in an approved open-access 
repository; or use a hybrid 
journal that intends to become 
a full open-access venue. 
Under all three routes, the 
papers must be published with 
a liberal CC BY licence, which 
allows commercial reuse of the 
papers findings. The posting 
of an article to a preprint 
server is not, alone, sufficient 
to comply with these rules. 


The proportion of biomedical journal articles that provide information 
on their funding, conflicts of interests and data is on the rise. 
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NEWSIN FO 


unleashed to ease publishing 


TECHNOLOGY AI peer reviewers 
grind p.609 


with a reputation for sexism 


COMMUNITY Can aconference 
change its ways? p.610 


HEALTH Frustrated researchers 
seek mice that better model 
Alzheimer’s disease p.611 


_ 
CLIMATE Researchers will test 
away to cool the planet with 
reflective particles p.613 


A Chinese scientist claims that twin girls have been born whose genomes were edited at the embryo stage. 


International outcry over 
senome-edited baby claim 


The revelation from a Chinese scientist represents a controversial leap in genome editing. 


BY DAVID CYRANOSKI & HEIDI LEDFORD 


Gee are shocked and outraged by 


reports that a Chinese scientist claims 

to have helped make the world’s first 
genome-edited babies — twin girls, who were 
born this month. 

He Jiankui, a genome-editing researcher 
at the Southern University of Science and 
Technology of China in Shenzhen, says that 
he impregnated a woman with embryos that 
had been edited to disable the genetic pathway 
HIV uses to infect cells. 


Ina video posted to YouTube on 26 Novem- 
ber, He says the girls are healthy and now at 
home with their parents. Sequencing of the 
babies’ DNA has shown that the editing 
worked, and altered only the target gene, he 
says. The scientist's claims have not been veri- 
fied through independent genome testing, nor 
published in a peer-reviewed journal. Later 
that day, the Chinese government announced 
an investigation into the claims. 

If the report is true, the twins’ birth would 
represent a significant — and controversial — 
leap in the use of genome editing. Until now, 


the use of these tools in embryos has been lim- 
ited to research, often to investigate the benefit 
of using the technology to eliminate disease- 
causing mutations from the human germ line. 
But some studies have reported off-target 
effects, raising significant safety concerns. 
Documents posted on China’s clinical- 
trial registry show that He used the popular 
CRISPR-Cas9 genome-editing tool to disable 
a gene called CCRS5, which encodes a protein 
that allows HIV to enter a cell. Genome- 
editing scientist Fyodor Urnov was asked 
to review documents that described the > 
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> claimed experiments for an article in MIT 
Technology Review. “The data I reviewed are 
consistent with the fact that the editing has, in 
fact, taken place,’ says Urnov, who is based at 
the Altius Institute for Biomedical Sciences in 
Seattle, Washington. But he adds that the only 
way to tell whether the children’s genomes have 
been edited is to independently test their DNA. 

Urnov takes issue with the decision to edit 
an embryos genome to prevent HIV infection. 
He is also using genome-editing tools to target 
the CCR5 gene, but his studies are in people 
with HIV, not embryos. He says that there are 
“safe and effective ways’ to use genetics to pro- 
tect people from HIV that do not involve edit- 
ing an embryo’s genes. 

Paula Cannon, who studies HIV at the Uni- 
versity of Southern California in Los Angeles, 
also questions He's decision to target that gene 
in embryos. She says that some strains of HIV 
don't even use this protein to enter cells, they 
use another protein called CXCR4. Even people 
who are naturally CCR5-negative are not com- 
pletely resistant to HIV, Cannon adds, because 
they could be infected by a CXCR4 strain. 

She also says it makes no sense that He 
recruited families with an HIV-positive father, 
as was the case with the twins, because there 
is no real risk of transmission to the children. 

“This experiment exposes healthy normal 
children to risks of gene editing for no real 
necessary benefit,” says Julian Savulescu, direc- 
tor of the Oxford Uehiro Centre for Practical 
Ethics at the University of Oxford, UK. 


In an interview with the Associated Press, 
He said the goal of the work was not to prevent 
transmission from the parents, but to offer 
couples affected by HIV a chance to have a 
child that might be protected from a similar 
fate. But years of research is needed to show 
that meddling with the genome of an embryo 
is not going to cause harm, says Joyce Harper, 
who studies women’s and reproductive health 
at University College London. Legislation and 
public discussion should also occur before 
genome editing is 


used in embryos “This is a huge 
destined forimplan- blow tothe 
tation. international 
Southern Univer- reputation and 
sity of Science and __ the development 
Technology said of Chinese 


in a statement on 
26 November that 
it was unaware of He’s experiments, that the 
work was not performed at the university and 
that He has been on leave since February. The 
university says its researchers must abide by 
national laws and regulations, and respect 
international academic ethics and academic 
standards. It will set up an independent com- 
mittee to investigate the matter. 

Making gene-edited babies goes against regu- 
lations released by China’ health and science 
ministries in 2003, but it is not clear whether 
there are penalties for those who break the rules. 

More than 100 Chinese biomedical research- 
ers posted a strongly worded statement 


science.” 


online condemning He’s claims. “Directly 
jumping into human experiments can only 
be described as crazy,’ the statement reads. 
The scientists call on Chinese authorities to 
release the findings of any investigation to the 
public. 

“This is a huge blow to the international 
reputation and the development of Chinese 
science, especially in the field of biomedical 
research,’ the statement says. “It is extremely 
unfair to the large majority of diligent and con- 
scientious scientists in China who are pursuing 
research and innovation while strictly adher- 
ing to ethical limits.” 

Nature tried to contact He but did not 
receive a response before its deadline. In his 
video, He says he supports the use of genome 
editing in embryos only in cases that relate to 
disease. “I understand my work will be contro- 
versial, but I believe families need this technol- 
ogy and I am willing to take the criticism for 
them, he says. 

News of the experiment came a day before 
researchers in the field gathered in Hong Kong 
for a major international meeting on genome 
editing, running from 27 to 29 November. 
Even before the news of He’s work emerged, 
many in the field thought it was inevitable that 
someone would use genome-editing tools to 
make changes to human embryos for implan- 
tation into women, and had been pushing for 
an international consensus on how genome 
editing to modify eggs, sperm or embryos 
should proceed. m 


PLANETARY SCIENCE 


‘Marsquake’ hunter begins 
to probe planet’s innards 


Joint US- French-German mission will monitor seismic activity on Mars. 


BY ALEXANDRA WITZE 


arthlings are about to hear Mars’s heart- 
beat. 


On 26 November, NASA's InSight 
mission touched down near the Martian 
equator and embarked on the first mission 
dedicated to listening for seismic energy 
rippling through the red planet. 

Any ‘marsquakes’ InSight detects could yield 
clues about the planet’s mysterious interior, 
including how it is separated into a core, man- 
tle and crust. Whatever scientists learn about 
Mars’s innards could help to illuminate how 
our own planet evolved billions of years ago. 

InSight had been cruising through space 
since its launch in May, tracked by mission 
control at NASA‘s Jet Propulsion Laboratory 


(JPL) in Pasadena, California. On Monday, 
just before 11:53 a.m. local time, the space- 
craft entered the Martian atmosphere at nearly 
20,000 kilometres per hour. 

As it neared Mars’s surface, the spacecraft 
demonstrated a new way to communicate with 
its controllers on Earth, 146 million kilometres 
away. Two ‘cubesats, each the size of a briefcase, 
relayed information from InSight to Earth in 
close to real time. The experiment suggests that 
miniature satellites like these could allow faster 
communication with probes in deep space. 

InSight landed at Elysium Planitia, a broad, 
flat region just north of the Martian equator. It 
is one of the most boring places on the planet, 
says Bruce Banerdt, a planetary scientist at JPL 
and the US$994-million mission's principal 
investigator. That’s an advantage for InSight, 
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which needs a safe, geologically stable place to 
do its work. 

The first photo that InSight sent from the 
surface of Mars showed a flat, relatively rock- 
free landscape stretching to the horizon, with 
the foreground speckled with dust from the 
landing. 

“It’s happy. The lander is not complain- 
ing,’ said Rob Manning, chief engineer at JPL, 
shortly after InSight touched down. 


LISTENING IN 

Mission scientists will use the lander’s cam- 
era to scout the ground for the smoothest and 
most level area to deploy its French-built seis- 
mometer (see ‘Ear to the ground’). InSight’s 
robotic arm will pluck the instrument off its 
back and place it on the ground, then put a 


NASA/JPL-CALTECH 


EAR TO THE GROUND 


NASA's Mars InSight lander will gather data on seismic activity to help scientists better understand the red 


planet’s mysterious interior. 


<= 


ae 


The 1.8-metre robotic arm will 
place a seismometer and heat 
probe onto Mars’s surface. 


The lander’s 
seismometer will listen 
for tremors known as 
marsquakes. 


dome-shaped wind shield over it. The whole 
process is expected to take several days. 

The seismometer includes three ground- 
motion sensors nested inside a vacuum, and 


Two solar panels will supply 
power to the lander and its 
instruments. 


ry 


A heat probe will dig down 
5 metres to measure 
temperature change over 
depth and time. 


its sensitivity allows it to detect movement as 
small as the width of an atom. The big chal- 
lenge will be determining which movements 
are caused by marsquakes and which are the 
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result of jostling by the wind or other sources. 
On the third day after landing, project sci- 
entists will switch on an instrument to track 
changes in the magnetic field, which will help 
them to identify sources of noise that aren't 
quakes, says Catherine Johnson, a geophysi- 
cist at the University of British Columbia in 
Vancouver, Canada. 

InSight wont deploy its German-built heat- 
flow probe until January. Over the course of 
several weeks, the instrument will drill five 
metres into the Martian surface, deeper than 
anything achieved before. Scientists will track 
changes in temperature as small as a few hun- 
dredths of a degree. That will tell them how 
much heat is leaving Mars, and how many 
heat-producing radioactive elements are 
packed inside it. 

InSight is meant to work for a little more 
than one Martian year, equivalent to almost 
two Earth years. It should measure 50-100 
marsquakes during that period, says Banerdt. 
The longer it survives, the more it will be able 
to detect — and the more researchers will be 
able to deduce about Mars’ internal structure. m 


The age of AI peer reviews 


Automated software can help review papers, but the decision-making stays with humans. 


BY DOUGLAS HEAVEN 


ost researchers have good reason 
M: grumble about peer review: it is 

time-consuming and error-prone, 
and the workload is unevenly spread, with 
just 20% of scientists taking on most reviews. 

Now peer review by artificial intelligence 
(AI) is promising to improve the process, 
boost the quality of published papers — and 
save reviewers time. A handful of academic 
publishers are piloting AI tools to do anything 
from selecting reviewers to checking statistics 
and summarizing a paper's findings. 

In June, software called StatReviewer, 
which checks that statistics and methods in 
manuscripts are sound, was adopted by Aries 
Systems, a peer-review management system 
owned by Amsterdam-based publishing giant 
Elsevier. And ScholarOne, a peer-review plat- 
form used by many journals, is teaming up 
with UNSILO of Aarhus, Denmark, which 
uses natural language processing and machine 
learning to analyse manuscripts. 

UNSILO uses semantic analysis of the 
manuscript text to extract what it identifies as 
the main statements. This gives a better over- 
view ofa paper than the keywords typically sub- 
mitted by authors, says Neil Christensen, sales 
director at UNSILO. “We find the important 


phrases in what they have actually written,” he 
says, “instead of just taking what they've come 
up with five minutes before submission.” 

UNSILO identifies which of these key phrases 
are most likely to be claims or findings, giving 
editors an at-a-glance summary of the results. 
It also highlights whether the claims are simi- 
lar to those from previous papers, which could 
be used to detect plagiarism or simply to place 
the manuscript in context with related work in 
the wider literature. “The tool’s not making a 
decision,” says Chris- 


tensen. “It’s just say- “It doesn’t 

ing: ‘Here are some replace editorial 
things that stand out judgement but, 
when comparing by God, it makes 


this manuscript with it easier.” 
everything that’s been 
published before. You be the judge.” 

“Tt doesnt replace editorial judgement but, by 
God, it makes it easier,’ says David Worlock, a 
UK-based publishing consultant who saw the 
UNSILO demonstration at the Frankfurt Book 
Fair in Germany last month. 

Worlock notes that there are several similar 
tools emerging. He is on the board of Wizdom.ai 
in London, a start-up owned by publishers Tay- 
lor & Francis, which is developing software 
that can mine paper databases and extract 
connections between different disciplines and 


concepts. He says that this kind of tool will be 
useful beyond peer review, for tasks such as 
writing grant applications or literature reviews. 

Many platforms, including ScholarOne, 
already have automatic plagiarism checkers. 
And services including Penelope.ai examine 
whether the references and the structure of a 
manuscript meet a journal’s requirements. Some 
can flag up issues with the quality of a study, 
too. The tool statcheck, developed by Michéle 
Nuijten, a methodologist at Tilburg University 
in the Netherlands, and her colleagues, assesses 
the consistency of authors’ statistics reporting, 
focusing on P values. The journal Psychological 
Science runs all its papers through the tool, and 
Nuijten says that other publishers are keen to 
integrate it into their review processes. 

When Nuijten’s team analysed papers 
published in psychology journals, they found 
that roughly 50% contained at least one statis- 
tical inconsistency (M. B. Nuijten et al. Behav. 
Res. Meth. 48, 1205-1226; 2016). In one in 
eight papers, the error was serious enough that 
it could have changed the statistical signifi- 
cance ofa published result. “That’s worrisome,” 
she says. She's not surprised that reviewers miss 
such mistakes, however. “Not everyone has 
time to go over all the numbers. You focus on 
the main findings or the general story.” 

For now, statcheck is limited to analysing > 
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Automation of standardized tasks could take the slog out of peer review. 


> manuscripts that use the American Psycho- 
logical Association's reporting style for statis- 
tics. By contrast, the creators of StatReviewer, 
Timothy Houle at Wake Forest University 
School of Medicine in North Carolina and 
Chadwick DeVoss, chief executive of tech start- 
up NEX7 in Madison, Wisconsin, say that their 
tool can assess statistics in standard formats 
and presentation styles from multiple fields. To 
do this, it checks that papers correctly include 
things such as sample sizes, information about 
blinding of experiments and baseline data. 


DETECTING FRAUD MARKERS 

StatReviewer can also identify markers of 
fraudulent behaviour, says DeVoss. “Things 
like, did they game some statistical rules, or 


did they flat-out make up data? If the risk is 
higher than what the journal is used to see- 
ing, they can look into the details.” DeVoss says 
that StatReviewer is being tested by dozens of 
publishers. A 2017 trial with the open-access 
publisher BioMed Central in London was 
inconclusive because the tool did not analyse 
enough manuscripts, but did nonetheless pro- 
vide some insights. BioMed Central is now 
planning a follow-up. 

StatReviewer did catch things that human 
reviewers missed, says Amy Bourke- Waite, 
communications director for open research at 
Springer Nature, which owns BioMed Central 
and publishes Nature (Nature’s news team is edi- 
torially independent of Springer Nature). For 
example, it was good at catching papers that did 


not meet required standards, such as following 
CONSORT, a manuscript format used by many 
publishers. Bourke- Waite adds that authors who 
took part said that they were as happy respond- 
ing to StatReviewer reports as they were to 
the human reviewer's. Occasionally, she says, 
StatReviewer got things wrong — but some- 
times its slip-ups drew authors attention to 
unclear reporting in their manuscripts. 

Even if the trials prove successful, De Voss 
expects that only some journals will want to 
pay to have all their manuscripts scanned. So 
he and his colleagues are targeting authors, too, 
hoping that they will use the tool to check their 
manuscripts before submission. 

There are potential pitfalls to Al in peer 
review in general. One concern is that 
machine-learning tools trained on previously 
published papers could reinforce existing 
biases in peer review. “If you build a decision- 
making system based on the articles which 
your journal has accepted in the past, it will 
have in-built biases,” says Worlock. And if an 
algorithm provides a single overall score after 
evaluating a paper, as StatReviewer does, there 
might be a temptation for editors to cut corners 
and simply rely on that score in deciding to 
reject a paper, says DeVoss. 

Algorithms are not yet smart enough to 
allow an editor to accept or reject a paper 
solely on the basis of the information they 
extract, says Andrew Preston, co-founder of 
Publons, a Wellington-based start-up acquired 
by Clarivate Analytics in Philadelphia, Penn- 
sylvania, that tracks peer review and is using 
machine learning to develop a tool to recom- 
mend reviewers. “These tools can make sure 
a manuscript is up to scratch, but in no way 
are they replacing what a reviewer would do 
in terms of evaluation.” 

Nuijten agrees: “The algorithms are going to 
need some time to perfect, but it makes sense 
to automate a lot of things, because a lot of 
things in peer review are standard” = 
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COMMUNITY 


Can conference shed reputation 
for hosting sexist behaviour? 


Al meeting wants to become more inclusive, but survey suggests it has along way to go. 


BY HOLLY ELSE 


ordes of artificial-intelligence 
H researchers will descend this week- 
end on one of the field’s hottest 
tickets: the Neural Information Processing 


Systems conference in Montreal, Canada. 
But although attendees at this annual event 


will hear talks on cutting-edge ideas in com- 
puter science, another issue will also be front 
and centre: whether the conference can pro- 
vide a welcoming environment for women 
as the field of artificial intelligence (AI) 
grapples with a culture of harassment and 
discrimination. 

The concerns were thrown into stark relief 
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earlier this month with the release of a survey 
of 2,375 people — most of whom had either 
attended the meeting or submitted papers for 
consideration in previous years. 

Respondents reported experiencing sexual 
harassment, seeing the conference welcome 
sexist people and regularly hearing sexist or 
sexually abusive comments and jokes. Women 


reported unwelcome, persistent advances 
from men at the conference. The analysis does 
not reveal what percentages of respondents 
reported these experiences, but does say that 
15% of respondents were women. 

Terrence Sejnowski, president of the founda- 
tion that oversees the conference, told Nature 
that the foundation's board, and others, had 
read the report with great interest, and thanked 
the authors for the analysis. “It provides us with 
valuable information for understanding our 
community,’ he said. 


DIVERSITY MEASURES 

The survey was carried out by Katherine 
Heller, a machine-learning researcher at Duke 
University in Durham, North Carolina, and 
Hal Daumé, a machine-learning researcher 
at the University of Maryland in College Park, 
who are the diversity and inclusion chairs at 
this year’s event. 

In December 2017, Sejnowski and the chairs 
of the boards of the 2017 and 2018 conferences 
acknowledged that several events held at or in 
conjunction with the 2017 conference had 
fallen short of the standards required to “pro- 
vide an inclusive and welcoming environment 
for everyone”. They said that they would take 
immediate action, including recruiting the 


diversity and inclusion chairs, formalizing the 
process for reporting concerns and strength- 
ening an existing code of conduct, by which 
all attendees and sponsors will have to abide 
in future. 

Their statement came shortly after several 
female machine-learning researchers spoke 
out about their experiences at last year’s event 
in Long Beach, California, and other AI con- 
ferences, including a joke about sexual assault, 
allegedly made by a member of a band com- 
posed of leading researchers at a party coincid- 
ing with the 2017 event. 

Other measures to improve inclusion 
include subsidized childcare and a diversity 
meeting. There are also now several ways 
for conference-goers with concerns to notify 
organizers. 

And on 16 November, the board abandoned 
the commonly used acronym, NIPS, and 
renamed the event NeurIPS. A March 2018 
letter to the board, signed by 122 academics at 
Johns Hopkins University in Baltimore, Mar- 
yland, said the NIPS acronym was “prone to 
unwelcome puns” and revealed further goings- 
on at the conference, including an unofficial 
sister event named “TITS” and T-shirts spotted 
bearing the slogan “my NIPS are NP-hard”. 

Researchers have mixed views about 
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whether the board’s efforts will bring meaning- 
ful change. Raia Hadsell, a machine-learning 
researcher at DeepMind in London who has 
been attending the conference for more than 
a decade has not witnessed a “rampant culture 
of discrimination, bias or harassment” at the 
event but has seen and experienced prob- 
lematic behaviour. “I find it infuriating to be 
asked whether I am a recruiter, or a ‘plus one, 
or whether I ‘did the work myself’ — do men 
ever, ever get asked questions like that?” she 
says. 

She thinks that the machine-learning com- 
munity wants to address the problems, but that 
their complexity makes it difficult. “I think that 
there will still be a problem come December 
in Montreal.” 

Elana Fertig, a computational biologist at 
Johns Hopkins University who signed the 
March letter to the board, says that altering the 
name is a powerful first step that has height- 
ened awareness of the issues and shows that 
change is possible. But two of Fertig’s students 
decided earlier this year not to attend the event 
because of the reported culture. And she wor- 
ries about a backlash against the name change, 
noting that there were negative, sometimes 
threatening, comments that accompanied the 
debate over the change. m 


Alzheimer’s researchers 
seek better mice 


Several teams are developing animal models that more closely mimic the disease in people. 


BY SARA REARDON 


rug companies have spent billions 
D of dollars searching for therapies 

to reverse or significantly slow 
Alzheimer’s disease, to no avail. Some 
researchers argue that the best way to make 
progress is to create better animal models for 
research, and several teams are now develop- 
ing mice that more closely simulate how the 
disease devastates people's brains. 

The US National Institutes of Health (NIH), 
the UK Dementia Research Institute and the 
Jackson Laboratory — one of the world’s 
biggest suppliers of laboratory mice — are 
among the groups trying to genetically engi- 
neer more-sophisticated rodents. Scientists 
are also probing the complex web of mutations 
that influence neurological decline in mice 
and people. 

“We appreciate that the models we had were 
insufficient, says Bruce Lamb, a neuroscientist 


at Indiana University in Indianapolis who 
directs the NIH-funded programme. “I think 
it’s sort of at a critical juncture right now.” 

Alzheimer’s is marked by cognitive decline 
and the build-up of amyloid-protein plaques 
in the brains of people, but the disease does 
not occur naturally 


in mice. Scientists “Ithinkit’s sort 
get around this by of atacritical 
studying mice that juncture right 
have been genetically now.” 

modified to produce 


high levels of human amyloid protein. These 
mice develop brain plaques, but no memory 
problems. 

Many experimental drugs that have success- 
fully removed plaques from mouse brains have 
not lessened the symptoms of Alzheimer’s dis- 
ease in people. One high-profile stumble came 
last month, when three companies reported that 
their Alzheimer’s drugs — from a class called 
BACE inhibitors — had failed in late-stage 


clinical trials. Although the drugs successfully 
blocked the accumulation of amyloid protein 
in mice, they seemed to worsen cognitive 
decline and brain shrinkage in people. 

The drive for better mice comes as genomics 
studies are linking the most common form 
of Alzheimer’s — late onset — to dozens of 
different genes. This diversity suggests that 
each case of the disease is caused by a differ- 
ent mix of genetic and environmental factors. 
“There is no single Alzheimer’s disease,” says 
Gareth Howell, a neuroscientist at the Jackson 
Laboratory in Bar Harbor, Maine. 

Howell argues that scientists’ reliance on 
inbred lab mice with only a few engineered 
mutations might have limited research. His own 
work suggests that, in mice, just as in people, 
genetic diversity plays a part in determining 
how neurodegeneration progresses. 

When Howell’s team modified two genes 
associated with early-onset Alzheimer’s in 
both lab mice and their wild cousins, all of > 
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> the animals developed amyloid plaques. 
But although the more-inbred lab mice did 
not display any outward signs of Alzhei- 
mer’s, a portion of the genetically diverse 
wild mice experienced memory problems. 
The researchers think that a combination of 
plaques and unknown genetic factors caused 
these symptoms. They presented the results 
this month at a meeting of the Society for 
Neuroscience in San Diego, California. 

Another study, by neuroscientist Catherine 
Kaczorowski at the Jackson Laboratory, sug- 
gests that animals’ genetic make-up affects how 
they respond to environmental triggers. Her 
group bred genetically diverse 
wild mice with lab mice that 
had mutations that cause 
amyloid plaques to form. 
Some of the resulting off- 
spring were more likely to 
develop cognitive prob- 
lems if they ate a high- 
fat diet, but other mice 
on the diet hada lower 
risk of these symp- 
toms, Kaczorowski 
reported at the 
San Diego meeting. 


An Alzheimer’s 
model mouse. 


Understanding how this expanded universe 
of genetic factors affects Alzheimer’s risk will 
require a host of new animal models with dif- 
ferent combinations of mutations. Several 
efforts to engineer these next-generation mice 
are already under way. 

In 2016, the NIH started the MODEL-AD 
consortium to develop more Alzheimer’s 
mice and make them available to researchers. 
Project scientists engineer mice with different 
genetic mutations associated with early- or 
late-onset Alzheimer’s, and test the animals to 
see whether they display signs of the disease. 
They then post descriptions of each mouse 

type in an online database. Lamb 
says that the team has released 
about 30 mouse varieties, and 
received more than 500 orders for 
the animals from academic scien- 
tists and biotechnology firms. 
And in January, the UK 
Dementia Research Insti- 
tute in London launched a 
similar programme. Scien- 
tists there are developing 
model mice whose brains 
show the amyloid plaques 
and tangles of another pro- 
tein, called tau, that occur 
in people with Alzhei- 
mer’s. To mimic the brain 


inflammation that the disease causes, the group 
is implanting neural immune cells grown from 
human stem cells into the brains of mice. 
Ultimately, researchers hope that the models 
will reveal ways to predict whether a person 
will respond to a particular Alzheimer’s ther- 
apy. And having a better understanding of 
how inflammation and genes drive the dis- 
ease could help to identify it in people before 
plaques and tangles have formed, says Rudolph 
Tanzi, a neurologist at Harvard University in 
Cambridge, Massachusetts. “That’s why it’s so 
important to have those animal models availa- 
ble and really start working on all these genes.” 
But Bart de Strooper, a molecular biologist at 
the Catholic University of Leuven in Belgium, 
urges caution. De Strooper, who directs the 
UK programme, says that none of the next- 
generation animals is likely to be a perfect 
analogue for people. “The biggest mistake you 
can make,” he says, “is to think you can ever 
have a mouse with Alzheimer’s disease.” = 


CORRECTION 

The News Feature ‘Why extreme rains are 
getting worse’ (Nature 563, 458-460; 2018) 
erroneously located Elizabeth Kendon in 
Reading. She is, in fact, at the Met Office in 
Exeter. 
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KAYANA SZYMCZAK 


THE SUN DIMMERS 


With dire climate scenarios on the horizon, researchers 
are getting serious about solar geoengineering. 


BY JEFF TOLLEFSON 


hen Daiholds up a small glass tube coated with a white powder: 

calcium carbonate, a ubiquitous compound used in everything 

from paper and cement to toothpaste and cake mixes. Plop a 

tablet of it into water, and the result is a fizzy antacid that calms 

the stomach. The question for Dai, a doctoral candidate at Harvard Uni- 
versity in Cambridge, Massachusetts, and her colleagues is whether this 
innocuous substance could also help humanity to relieve the ultimate 
case of indigestion: global warming caused by greenhouse-gas pollution. 
The idea is simple: spray a bunch of particles into the stratosphere, 
and they will cool the planet by reflecting some of the Sun’s rays back 
into space. Scientists have already witnessed the principle in action. 
When Mount Pinatubo erupted in the Philippines in 1991, it injected 
an estimated 20 million tonnes of sulfur dioxide into the stratosphere 
— the atmospheric layer that stretches from about 10 to 50 kilometres 
above Earth's surface. The eruption created a haze of sulfate particles that 
cooled the planet by around 0.5°C. For about 18 months, Earth’s average 
temperature returned to what it was before the arrival of the steam engine. 
The idea that humans might turn down Earth's thermostat by simi- 
lar, artificial means is several decades old. It fits into a broader class of 


planet-cooling schemes known as geoengineering 
that have long generated intense debate and, in 
some cases, fear. 

Researchers have largely restricted their work 
on such tactics to computer models. Among the 
concerns is that dimming the Sun could backfire, 
or at least strongly disadvantage some areas of the world by, for example, 
robbing crops of sunlight and shifting rain patterns. 

But as emissions continue to rise and climate projections remain dire, 
conversations about geoengineering research are starting to gain more 
traction among scientists, policymakers and some environmentalists. 
That's because many researchers have come to the alarming conclusion 
that the only way to prevent the severe impacts of global warming will be 
either to suck massive amounts of carbon dioxide out of the atmosphere 
or to cool the planet artificially. Or, perhaps more likely, both. 

If all goes as planned, the Harvard team will be the first in the world 
to move solar geoengineering out of the lab and into the stratosphere, 
with a project called the Stratospheric Controlled Perturbation Experi- 
ment (SCoPEx). The first phase — a US$3-million test involving two 
flights of a steerable balloon 20 kilometres above the southwest United 
States — could launch as early as the first half of 2019. Once in place, the 


Frank Keutsch, Zhen 
Dai and David Keith 
(left to right) in 
Keutsch’s laboratory 
at Harvard University. 
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experiment would release small plumes of calcium carbonate, each of 
around 100 grams, roughly equivalent to the amount found in an average 
bottle of off-the-shelf antacid. The balloon would then turn around to 
observe how the particles disperse. 

The test itselfis extremely modest. Dai, whose doctoral work over the 
past four years has involved building a tabletop device to simulate and 
measure chemical reactions in the stratosphere in advance of the experi- 
ment, does not stress about concerns over such research. “I’m studying a 
chemical substance,’ she says. “It’s not like it’s a nuclear bomb.’ 

Nevertheless, the experiment will be the first to fly under the banner of 
solar geoengineering. And so it is under intense scrutiny, including from 
some environmental groups, who say such efforts are a dangerous distrac- 
tion from addressing the only permanent solution to climate change: reduc- 
ing greenhouse-gas emissions. The scientific outcome of SCoPEx doesn't 
really matter, says Jim Thomas, co-executive director of the ETC Group, 
an environmental advocacy organization in Val-David, near Montreal, 
Canada, that opposes geoengineering: “This is as much an experiment in 
changing social norms and crossing a line as itis a science experiment?” 

Aware of this attention, the team is moving slowly and is working to set 
up clear oversight for the experiment, in the form of an external advisory 
committee to review the project. Some say that such a framework, which 
could pave the way for future experiments, is even more important than 
the results of this one test. “SCoPEx is the first out of the gate, and it is 
triggering an important conversation about what independent guidance, 
advice and oversight should look like,” says Peter Frumhoff, chief climate 
scientist at the Union of Concerned Scientists in Cambridge, Massachu- 
setts, and amember of an independent panel that has been charged with 
selecting the head of the advisory committee. “Getting it done right is far 
more important than getting it done quickly.” 


JOINING FORCES 

In many ways, the stratosphere is an ideal place to try to make the atmos- 
phere more reflective. Small particles injected there can spread around 
the globe and stay aloft for two years or more. If placed strategically and 
regularly in both hemispheres, they could create a relatively uniform 
blanket that would shield the entire planet (see ‘Global intervention). 
The process does not have to be wildly expensive; ina report last month, 
the Intergovernmental Panel on Climate 
Change suggested that a fleet of high-flying 
aircraft could deposit enough sulfur to offset 
roughly 1.5°C of warming for around $1 billion 
to $10 billion per year’. 

Most of the solar geoengineering research so 
far has focused on sulfur dioxide, the same sub- 
stance released by Mount Pinatubo. But sulfur 
might not be the best candidate. In addition to 
cooling the planet, the aerosols generated in that 
eruption sped up the rate at which chlorofluoro- 
carbons deplete the ozone layer, which shields 
the planet from the Sun’s harmful ultraviolet 
radiation. Sulfate aerosols are also warmed by 
the Sun, enough to potentially affect the movement of moisture and even 
alter the jet stream. “There are all of these downstream effects that we don't 
fully understand,’ says Frank Keutsch, an atmospheric chemist at Harvard 
and SCoPEx’s principal investigator. 

The SCoPEx team’s initial stratospheric experiments will focus on 
calcium carbonate, which is expected to absorb less heat than sulfates 
and to have less impact on ozone. But textbook answers — and even Dai’s 
tabletop device — can’t capture the full picture. “We actually don't know 
what it would do, because it doesn’t exist in the stratosphere,” Keutsch 
says. “That sets up ared flag” 

SCoPEx aims to gather real-world data to sort this out. The experiment 
began as a partnership between atmospheric chemist James Anderson of 
Harvard and experimental physicist David Keith, who moved to the uni- 
versity in 2011. Keith has been investigating a variety of geoengineering 
options offand on for more than 25 years. In 2009, while at the University 
of Calgary in Canada, he founded the company Carbon Engineering, in 


“There are all of 
these downstream 
effects that 
we don’t fully 
understand.” 


Squamish, which is working to commercialize technology to remove 
carbon dioxide from the atmosphere. After joining Harvard, Keith used 
research funding he had received from Microsoft co-founder Bill Gates 
to begin planning the experiment. 

Keutsch, who got involved later, is not a climate scientist and is at best 
a reluctant geoengineer. But he worries about where humanity is head- 
ing, and what that means for his children’s future. When he saw Keith 
talk about the SCoPEx idea at a conference after starting at Harvard in 
2015, he says his initial reaction was that the idea was “totally insane”. 
Then he decided it was time to engage. “I asked myself, an atmospheric 
chemist, what can I do?” He joined forces with Keith and Anderson, and 
has since taken the lead on the experimental work. 


AN EYE ON THE SKY 

Already, SCoPEx has moved farther along than earlier solar 
geoengineering efforts. The UK Stratospheric Particle Injection for 
Climate Engineering experiment, which sought to spray water 1 kilome- 
tre into the atmosphere, was cancelled in 2012 in part because scientists 
had applied for patents on an apparatus that could ultimately affect every 
human on the planet. (Keith says there will be no patents on any tech- 
nologies involved in the SCoPEx project.) And US researchers with the 
Marine Cloud Brightening Project, which aims to spray saltwater drop- 
lets into the lower atmosphere to increase the reflectivity of ocean clouds, 
have been trying to raise money for the project for nearly a decade. 

Although SCoPEx could be the first solar geoengineering experi- 
ment to fly, Keith says other projects that have not branded themselves 
as such have already provided useful data. In 2011, for example, the 
Eastern Pacific Emitted Aerosol Cloud Experiment pumped smoke into 
the lower atmosphere to mimic pollution from ships, which can cause 
clouds to brighten by capturing more water vapour. The test was used to 
study the effect on marine clouds, but the results had a direct bearing on 
geoengineering science: the brighter clouds produced a cooling effect 
50 times greater than the warming effect of the carbon emissions from 
the researchers’ ship’. 

Keith says that the Harvard team has yet to encounter public 
protests or any direct opposition — aside from the occasional con- 
spiracy theorist. The challenge facing researchers, he says, stems more 
from a fear among science-funding agencies 
that investing in geoengineering will lead to 
protests by environmentalists. 

To help advance the field, Keith set a goal in 
2016 of raising $20 million to support a formal 
research programme that would cover not just 
the experimental work, but also research into 
modelling, governance and ethics. He has raised 
around $12 million so far, mostly from philan- 
thropic sources such as Gates; the pot provides 
funding to dozens of people, largely on a part- 
time basis. 

Keith and Keutsch also want an external advi- 
sory committee to review SCoPEx before it flies. 
The committee, which is still to be selected, will report to the dean of engi- 
neering and the vice-provost for research at Harvard. “We see this as part 
ofa process to build broader support for research on this topic,’ Keith says. 

Keutsch is looking forward to having the guidance of an exter- 
nal group, and hopes that it can provide clarity on how tests such as 
his should proceed. “This is a much more politically challenging 
experiment than I had anticipated,” he says. “I was a little naive.” 

SCoPEx faces technical challenges, too. It must spray particles of 
the right size: the team calculates that those with a diameter of about 
0.5 micrometres should disperse and reflect sunlight well. The balloon 
must also be able to reverse its course in the thin air so that it can pass 
through its own wake. Assuming the team is able to find the calcium 
carbonate plume — and there is no guarantee that they can — SCoPEx 
needs instruments that can analyse the particles and, it is hoped, carry 
samples back to Earth. 

“It's going to be a hard experiment, and it may not work,’ says David 
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Global intervention 


One way to cool the planet quickly 
would be to make the sky block more 
sunlight. But predicting the knock-on 
effects — both positive and negative 
— remains a major challenge. 


Fahey, an atmospheric scientist at 
the National Oceanic and Atmos- 
pheric Administration in Boulder, 
Colorado. In the hope that it will, 
Fahey’s team has provided SCoPEx 
with a lightweight instrument 
that can reliably measure the size 
and number of particles that are 
released. The balloon will also be 
equipped with a laser device that 
can monitor the plume from afar. 


High-flying planes could release small 
particles into the stratosphere to 
reflect incoming rays. One estimate 
says this could reduce global 
temperatures by roughly 1.5 °C for 
less than US$10 billion a year. 


Other equipment that could collect = v 
information on the level of mois- Pe de = 
ture and ozone in the stratosphere ae ae eet 
could fly on the balloon as well. Ses ee Sey a 
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UP TO THE STRATOSPHERE 

Keutsch and Keith are still work- 
ing out some of the technical 
details. Plans with one balloon 
company fell through, so they are 
now working with a second. And 
an independent team of engineers 
in California is working on options 
for the sprayer. To simplify things, 
the SCoPEx group plans to fly 
the balloon during the spring or 
autumn, when stratospheric winds 
shift direction and — for a brief 
period — calm down, which will 
make it easier to track the plume. 

For all of these reasons, Keutsch 
characterizes the first flight as an 
engineering test, mainly intended 
to demonstrate that everything 
works as it should. The team is 
ready to spray calcium carbonate 
particles, but could instead use salt 
water to test the sprayer if the advisory 
committee objects. 

Keith still thinks that sulfate aerosols 
might ultimately be the best choice 
for solar geoengineering, if only because there has been more 
research about their impact. He says that the possibility of sulfates 
enhancing ozone depletion should become less of a concern in the future, 
as efforts to restore the ozone layer through pollutant reductions continue. 
Nevertheless, his main hope is to establish an experimental programme 
in which scientists can explore different aspects of solar geoengineering. 

There are a lot of outstanding questions. Some researchers have 
suggested that solar geoengineering could alter precipitation patterns 
and even lead to more droughts in some regions. Others warn that one of 
the possible benefits of solar geoengineering — maintaining crop yields 
by protecting them from heat stress — might not come to pass. In a study 
published in August, researchers found that yields of maize (corn), soya, 
rice and wheat’ fell after two volcanic eruptions, Mount Pinatubo in 1991 
and El Chichén in Mexico in 1982, dimmed the skies. Such reductions 
could be enough to cancel out any potential gains in the future. 

Keith says the science so far suggests that the benefits could well out- 
weigh the potential negative consequences, particularly compared with a 
world in which warming goes unchecked. The commonly cited drawback 
is that shielding the Sun doesnt affect emissions, so greenhouse-gas levels 
would continue to rise and the ocean would grow even more acidic. But 
he suggests that solar geoengineering could reduce the amount of carbon 
that would otherwise end up in the atmosphere, including by minimizing 
the loss of permafrost, promoting forest growth and reducing the need 
to cool buildings. In an as-yet-unpublished analysis of precipitation and 
temperature extremes using a high-resolution climate model, Keith and 


Cooler temperatures and more 
scattered light could promote 
the growth of forests and other 
ecosystems, locking more 
atmospheric carbon away. 


Crops would benefit from 
reduced heat stress, but 

lower levels of direct sunlight 
could hamper growth. 
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others found that nearly all regions 
of the world would benefit from a 
moderate solar geoengineering 
programme. “Despite all of the 
concerns, we can’t find any areas 
that would be definitely worse off? 
he says. “If solar geoengineering is 
as good as what is shown in these 
models, it would be crazy not to 
take it seriously.’ 

There is still widespread uncer- 
tainty about the state of the science 
and the assumptions in the mod- 
els — including the idea that 
humanity could come together to 
establish, maintain and then even- 
tually dismantle a well-designed 
geoengineering programme while 
tackling the underlying problem 
Low. of emissions. Still, prominent 

organizations, including the UK 
Royal Society and the US National 
Academies of Sciences, Engineer- 
ing, and Medicine, have called for 
more research. In October, the 
academies launched a project that 
will attempt to provide a blueprint 
for such a programme. 

Some organizations are already 
trying to promote discussions 
among policymakers and govern- 
ment officials at the international 
level. The Solar Radiation Man- 
agement Governance Initiative 
is holding workshops across the 
global south, for instance. And 
Janos Pasztor, who handled climate 

issues under former UN secretary- 
general Ban Ki-moon, has been talking 
to high-level government officials around the 
world in his role as head of the Carnegie Climate 
Geoengineering Governance Initiative, a non-profit 
organization based in New York. “Governments need to engage 
in this discussion and to understand these issues,’ Pasztor says. “They 
need to understand the risks — not just the risks of doing it, but also the 
risks of not understanding and not knowing” 

One concern is that governments might one day panic over the 
consequences of global warming and rush forward with a haphazard 
solar-geoengineering programme, a distinct possibility given that the 
costs are cheap enough that many countries, and perhaps even a few 
individuals, could probably afford to go it alone. These and other ques- 
tions arose earlier this month in Quito, Ecuador, at the annual summit 
of the Montreal Protocol, which governs chemicals that damage the 
stratospheric ozone layer. Several countries called for a scientific assess- 
ment of the potential effects that solar geoengineering could have on 
the ozone layer, and on the stratosphere more broadly. 

Ifthe world gets serious about geoengineering, Fahey says that there are 
plenty of sophisticated experiments that researchers could do using satel- 
lites and high-flying aircraft. But for now, he says, SCoPEx will be valuable 
— ifonly because it pushes the conversation forward. “Not talking about 
geoengineering is the greatest mistake we can make right now.’ m 


Researchers aim 
to release calcium 
carbonate into the 
stratosphere next 

year to test some 

aspects of solar 
geoengineering. 


Dimmer skies could shift 
global precipitation 
patterns, pulling water 
resources from some areas. 


Without action on 
greenhouse gases, the 
oceans would continue 
to absorb carbon dioxide 
and grow more acidic. 


Jeff Tollefson is a reporter for Nature in New York City. 
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DOES SCIENCE HAVE A 


BULLYING 
PROBLEM? 


A spate of bullying allegations have rocked some high- 
profile science institutions. Here’s how researchers, 
universities and funders are dealing with the issue. 


BY HOLLY ELSE 
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n August, accusations of bullying 
roiled the Institute of Cancer 
Research in London, one of the 
leading science centres in the 
United Kingdom. A prominent 
cancer researcher there, geneti- 
cist Nazneen Rahman, resigned 
from the institute following an 
investigation into allegations 
that she had bullied her staff. 
And in an unprecedented move, the biomedi- 
cal charity the Wellcome Trust revoked £3.5 
million (US$4.5 million) of the funding it had 
given her. 

Three months on, many more people from 
Rahman's lab have left the institute. Yet most 
of the details about the case remain hidden 
from the public: Rahman has not commented 
about the allegations and the institute has 
released little information. It even withheld 
certain findings from the Wellcome Trust 
because they contained highly confidential 
personal information. The secrecy — and the 
resulting confusion — are prime examples of 
the difficulties that scientific institutions and 
researchers face in dealing with the thorny 
issue of bullying. 

The case is part of a spate of allegations that 
have rocked major scientific institutions in 
the past year. At Germany’s prestigious Max 
Planck Society, two directors were accused of 
bullying; and the UK-based Leverhulme Trust 
revoked £1 million in funding from palaeon- 
tologist Nicholas Longrich at the University of 
Bath following an investigation into bullying 
allegations. One of the world’s leading genom- 
ics centres, the Wellcome Sanger Institute in 
Hinxton, UK, has also investigated claims of 
bullying. But the decision to clear the Sanger’s 
management of this and other allegations has 
led some of those who complained to ques- 
tion the scope and extent of the probe. The 
Wellcome and other science funders, includ- 
ing Cancer Research UK (CRUK), have 
announced policies this year that prohibit 
bullying as well as other forms of harassment. 

The flurry of activity surrounding bullying 
has raised questions about how scientific 
organizations are run and how some research- 
ers conduct themselves. Here, Nature exam- 
ines what constitutes bullying, why so many 
accusations are arising and what impact it is 
having on research and on those who do it. 


What is bullying? 
Bullying between colleagues is commonly 
defined by psychologists, unions and work- 
place scholars as repeated and malicious mis- 
treatment of someone that results in harm. 
At its most obvious, this behaviour involves 
shouting, insulting or intimidating victims. 
But bullying can include more subtle actions, 
says Alison Antes, a workplace psychologist 
who studies researcher leadership and man- 
agement practices at Washington University 
in St Louis, Missouri. 

It can take the form of someone spreading 


malicious rumours about another, undermining 
their work and opinions, or withholding infor- 
mation necessary for them to do their jobs. 
Supervisors can become bullies if they are 
overbearing, constantly changing a person's 
duties or giving them impossible workloads 
or unachievable deadlines. These more subtle 
forms of malicious conduct can often cause the 
most problems because they tend to be difficult 
to detect and are open to differing interpreta- 
tions, says Antes. 

Some actions might fit into a grey zone. 
What one person considers firm management, 
another might consider bullying, says Antes. It 
is not difficult to imagine, for example, a PhD 
supervisor giving a student a raft of unfamil- 
iar experiments to complete, with a deadline 
that leaves the student stressed and working 
all night. Is this bullying? 

The answer depends on the broader 
behaviour and approach, explains Loraleigh 
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Around 40% say that they have witnessed or 
heard about bullying happening to some- 
one else. This is considerably higher than 
the reports of bullying in the general work- 
place. Studies in the United States report that 
10-14% of people in the general working 
population say that they have experienced 
bullying over the previous year’. 

One of the largest studies of bullying in 
universities — surveying 14,000 higher- 
education staff — was published by the UK 
University and College Union in 2012 (ref. 2). 
It found that the rate of bullying varied hugely 
among the 92 institutions surveyed. Between 
2% and 19% of staff at each university said 
they were always or often subject to bullying 
at work. 

Universities came out better than average in 
an earlier survey, this one published in 2000 
and sponsored by the British Occupational 
Health Research Foundation’, which included 


“THIS ISN’T PECULIAR TO SCIENCE, 
WE WILL SEE IT IN ALL WALKS OF LIFE.” 


Keashly, a communications scientist at Wayne 
State University in Detroit, Michigan. “Tough 
supervisors are not bullies if they set up clear 
expectations and communicate them directly 
to reportees. They will also acknowledge and 
appreciate staff members who meet those 
expectations. If employees do not achieve 
their goals, good supervisors will give specific 
and constructive feedback, she says. 

Naomi Ellemers, a social psychologist at 
Utrecht University in the Netherlands who 
has studied how people are treated in aca- 
demia, adds that supervisors on the right side 
of the line will give people the time, support 
and resources to achieve their goals, and treat 
them respectfully. 

A bully, by contrast, is typically not inter- 
ested in developing relationships that allow 
their subordinates to grow professionally, 
says Keashly. They might also dish out bully- 
ing behaviour on a whim, whether or not the 
person they are targeting has failed to perform 
well, she adds. 


How common is it in research — 

and is it getting worse? 

Nobody knows how much bullying goes on in 
science, because few people have investigated 
the issue. Studies of bullying in workplaces 
began only in the 1990s, and some research- 
ers have yet to examine what goes on in their 
own back yards. 

But Keashley thinks that this needs to 
change so that the behaviour can be better 
managed. 

Her research, which draws on published 
evidence of bullying in academia from 
around the world, suggests that, in general, 
one-quarter to one-third of academics say 
that they have been bullied in the past year’. 


5,288 workers in 15 fields. Just 7% of the 
483 respondents who work in higher educa- 
tion say that they are occasionally or regularly 
bullied — the third-lowest score of all the pro- 
fessions looked at. (Only retailing and manu- 
facturing has less bullying.) But Cary Cooper, 
a workplace psychologist at the University of 
Manchester, UK, who co-authored the study, 
says that this under-represents the true prob- 
lem in universities. His survey had a relatively 
strict definition of bullying: workers qualified 
as being bullied if they had experienced per- 
sistent demeaning and devaluing treatment. 

For comparison, a study of bullying in 
neonatal intensive-care units at 17 Greek 
hospitals found that more than half of the 
almost 400 doctors and nurses surveyed had 
experienced bullying”. 

And an online survey of more than 
1,000 US adults conducted last April reported 
that 19% had experienced bullying at work’. 

Because there are so few data about 
bullying in research, and specifically science, 
Keashley and other researchers say it is not 
clear whether the problem is getting worse. 
Matt Waddup, head of policy at the UK Uni- 
versity and College Union in London, says 
that bullying is not always easy to pin down 
in cases that come to the union, because it 
is often a component of other problems that 
members have. But he thinks it is on the rise. 

Part of the reason could be that people 
across society are reconsidering what types of 
behaviours are acceptable. Ellemers says that 
the #metoo movement has made it a little eas- 
ier for those in low-power positions to report 
bullying, harassment and other inappropriate 
behaviours exerted by those above them. It 
has also spurred those in charge to take 
action instead of dismissing or ignoring the 
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complaints, which often happened previously, 
she says. 

Karen Vousden, chief scientist at CRUK in 
London, which recently introduced an anti- 
bullying policy for the labs it funds, says that 
society at large is now discussing these issues. 
“This isn’t peculiar to science, we will see it in 
all walks of life,” she says. 


What contributes to bullying 

in science? 

For the most part, says Antes, principal 
investigators generally “love what they do 
and do the right thing”. But there are clearly 
exceptions — and certain factors in scien- 
tific research seem to encourage what some 
academics call abusive supervision. 

Lab heads wield a lot of power over their 
trainees — students and postdocs — who 
depend on them for help, recommendations 
and opportunities, says Ellemers. This type 
of dependence and hierarchical structure can 
allow people to get away with bullying because 
it makes it difficult for those targeted or watch- 
ing to confront the perpetrator, raise it with 
more senior colleagues or simply walk out. As 
a result, bullying can continue unchallenged 
for a long time, she says. 

And bullying is not always malicious: the 
intense pressure to get grants, results and pub- 
lications can push people to behave in prob- 
lematic ways unintentionally, adds Antes. 

According to another idea, science is 
susceptible to bullying partly because of 
the types of people who tend to choose that 
career. “In academia you do deal with a lot of 
individuals who are very intelligent but also 
have large egos,” says Matthew Martin, who 


What are scientific institutions 
doing about it? 

The majority of UK universities have poli- 
cies that prohibit bullying and harassment, 
says Waddup. These documents typically 
include definitions and examples and they 
advise on what to do if someone encounters 
such problems. 

In July, the University of Bath reprimanded 
Longrich after it found that he had violated 
its dignity and respect policy. The institution 
issued him with a verbal warning and made 
changes to his ‘supervisory arrangements. 
Some people who have worked with Long- 
rich feel that the university’s initial actions 
did not go far enough. Subsequently, the Lev- 
erhulme Trust, which had funded Longrich, 
revoked his £1-million grant. Longrich has 
not responded to Nature’s repeated requests 
for comment. A University of Bath spokesper- 
son told Nature: “Our HR procedures ensure 
people involved are treated reasonably, con- 
sistently and fairly,” 

In general, having policies is not enough, 
says C. K. Gunsalus, a specialist in research 
integrity at the University of Illinois at Urbana- 
Champaign. To stamp out bad behaviour, lead- 
ers need to apply policies consistently and show 
that bullying has consequences, she says. “One 
of the worst things you can do is start the pro- 
cess and abandon it. It reinforces the problem” 

Bullying policies vary widely around the 
world. They are less common at universities 
and other institutions in the United States than 
in the United Kingdom. Unpublished research 
by Leah Hollis at Morgan State University in 
Baltimore, Maryland, who studies bullying in 
higher education, suggests that only around 


“ONE OF THE WORST THINGS YOU CAN DO 
IS START THE PROCESS AND ABANDON IT.” 


studies bullying at West Virginia University, 
in Morgantown. And some egocentric peo- 
ple might be more prone to bullying because 
they are unconcerned with others’ feelings, 
he proposes. 

Often in science, there can be only a handful 
of people who are experts in a specific field, 
so junior researchers who experience bullying 
might think that it is worth putting up with the 
behaviour because in the long run it will pay 
off for them, explains Antes. “Your career suc- 
cess starts to be woven around their success,” 
she says, making it even harder to speak out 
about poor behaviour. 

And some researchers could have spent their 
early careers ina lab where bullying behaviour 
was the norm. They might be trying to use 
these tactics on their staff because they think 
that is what made them successful. So the bul- 
lying behaviours are actually coming from a 
place of care, says Antes — a perception that 
this will help others. 


one-fifth of institutions have such policies that 
are easy to find. In France, workplace bully- 
ing is referred to as ‘moral harassment and is 
illegal; similar laws exist in Australia, Sweden, 
Belgium and several Canadian provinces. 

As harassment and bullying accusations 
have captured more attention in the past year, 
several major science funders have stepped 
up to develop policies. The Wellcome Trust’s 
policy specifically prohibits bullying as well as 
harassment. Some other funders are less clear. 
Neither the US National Science Foundation 
nor the US National Institutes of Health specif- 
ically mentions bullying in its anti-harassment 
policies. 


What is the effect on science 

and scientists? 

No one knows whether bullying has a 
negative impact on science — but Antes 
suspects it does. “Maybe some people can 
thrive in that environment, but I don’t think 
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most people do,” she says. 

Those who are bullied are more likely to 
be distracted and make mistakes, says Keas- 
hly. At worst, bullying can contribute to 
long-term problems with mental or physical 
health. That has an impact beyond the victim 
themselves, eroding the creativity, productiv- 
ity and well-being of an entire lab. 

After Rahman resigned and the Wellcome 
revoked her funding, the upheaval had rip- 
ple effects. The Institute of Cancer Research 
says that it followed standard processes for 
when a team leader leaves. Only one-third of 
the 15 people in her research team still work 
at the institute. 

One concern about bullying is that it can 
drive people away from science permanently, 
especially those who were the targets, says 
Vousden. “Our workforce is incredibly pre- 
cious. We spend huge amounts of time on 
mentoring and funding people,” she says. 
“Our scientists are in some degree our most 
valuable component.” 


What needs to be done? 

The next big job for institutions, says 
Vousden, is to create an open and supportive 
atmosphere in which people feel comfort- 
able enough to bring up any concerns ina 
non-confrontational way. This can help to 
prevent situations from escalating “to the 
point where you have 50 people making 
complaints about 10 years of behaviour’, 
she says. 

Towards this goal, CRUK will be auditing 
the institutions it funds to check that they are 
adhering to its anti-bullying policy. 

Another important step is for universities 
to offer training to scientists who assume 
management roles, says Cooper. Institutions 
should also reward researchers for taking on 
management tasks. 

Hollis says that institutions without bullying 
policies should develop and put them in place. 
Crucially, they then need to follow the pro- 
cedures. “It sounds simple, but many schools 
don't follow such policies,” she says. 

And the policies must apply “regardless 
of whether the bully is a vice-president or 
grounds workers’, says Hollis. “Bullying 
occurs because the organization allows it to 
occur.” @ 


Holly Else is a reporter with Nature in 
London. 
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pitfalls of 


personalized medicine 


Misleading terminology and arbitrary divisions stymie drug trials and can give false 
hope about the potential of tailoring drugs to individuals, warns Stephen Senn. 


ersonalized medicine aims to match 
P individuals with the therapy that 

is best suited to them and their 
condition. Advocates proclaim the poten- 
tial of this approach to improve treatment 
outcomes by pointing to statistics about 
how most drugs — for conditions ranging 
from arthritis to heartburn — do not work 
for most people’. That might or might not 
be true, but the statistics are being mis- 
interpreted. There is no reason to think 
that a drug that shows itself to be marginally 
effective in a general population is simply 
in want of an appropriate subpopulation in 
which it will perform spectacularly. 


The reasoning follows a familiar, flawed 
pattern. If more people receiving a drug 
improve compared with those who are given 
a placebo, then the subset of individuals who 
improved is believed to be somehow special. 
The problem is that the distinction between 
these ‘responders’ and ‘non-responders’ can 
be arbitrary and illusory. 

Much effort then goes into the effort 
to uncover a trait to explain this differen- 
tial response, without assessing whether 
or not such a differential exists. I think 
that this is one of many reasons why a 
large proportion of biomarkers thought 
to distinguish patient subgroups fall flat. 


Researchers need to be much more careful. 
To be clear, I am not talking about research, 
often in cancer, that defines subpopulations 
of patients in advance. In that scenario, the 
aim is to test prospectively whether a par- 
ticular drug works better (or worse) in people 
whose cancer cells have a specific genetic 
defect — a biomarker such as a HER2 muta- 
tion in breast cancer or the BCR-ABL fusion 
gene in leukaemia. (It’s worth stating that 
the overall percentage of US patients with 
advanced or metastatic cancer who benefit 
from such ‘genome-informed’ cancer drugs 
is estimated to be less than 7% at best’; the 
proportion is likely to be lower for those 


29 NOVEMBER 2018 | VOL 563 | NATURE | 619 


© 2018 Springer Nature Limited. All rights reserved. 


> whose cancer is at an earlier stage.) 

What I take issue with is the de facto 
assumption — often made in studies of 
chronic diseases such as migraine and asthma 
— that the differential response to a drug is 
consistent for each individual, predictable 
and based on some stable property, such as a 
yet-to-be-discovered genetic variant. 

Consider an actual clinical trial in which 
71 patients were treated with two doses. 
Twenty ‘responded’ to both doses, 29 to 
neither dose and 14 to the higher dose, but 
not the lower one. That is as expected. More 
surprising is that eight ‘responded’ to the 
lower dose and not the higher one, which 
is at odds with how drugs are known to 
work. The most likely explanation is that the 
‘response’ is not a permanent characteristic 
of a person receiving the treatment; rather, 
it varies from occasion to occasion. In this 
example, the fact that two doses of the same 
drug were being compared alerts us to the 
need to consider that source of variability. 
If the comparison instead involved different 
molecules, researchers might then overlook 
the explanation of occasion-to-occasion 
variation and jump to the conclusion that the 
results must reflect a differential response. 

Ihave seen unsubstantiated interpretations 
waft through the literature. They start with 
trials designed to show whether a drug works, 
and then get misinterpreted. For example, a 
2005 study found that one ulcer treatment 
led to healing in 96% of patients after 8 weeks, 
and another treatment healed 92% of patients, 
a difference of 4% (ref. 3). This finding filtered 
into a 2006 meta-analysis’, and then a third 
article’ followed an all-too-common statisti- 
cal practice, stating that only 1 in 25 (or 4%) 
of patients would benefit from the first ulcer 
treatment. It is not hard to imagine other 
researchers carrying out futile work to try to 
understand why. 


TRIAL TRAPS 


Here are some common pitfalls. 


Lazy language. Participants in clinical trials 
are often categorized as being responders or 
non-responders on the basis of an arbitrary 
measure of improvement — such asa certain 
percentage drop in established clinical scales 
that assess depression or schizophrenia. It 
does not necessarily follow that any individ- 
ual who improves owes that improvement 
to the treatment. Researchers who acknowl- 
edge in the methods section of a paper that 
an observed change is not a proven effect of 
a drug often forget to make that distinction 
in the discussion. Variations are uncritically 
attributed to characteristics of the person 
receiving treatment rather than to numerous 
other possibilities. 


Arbitrary dichotomies. Other classifica- 
tions can depend on whether a participant 
falls on one side or another of a boundary on 


COMPARE EACH PATIENT AT LEAST TWICE 


To find out whether a drug works better for some people, researchers can compare it to a placebo in those 
individuals more than once. (All data here are simulated.) 


JUST ONE TEST: DOES THE DRUG WORK? 


A single comparison to a placebo that 
gives results such as these suggests only 
that, overall, the drug works better than 
the placebo. 
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a continuous measurement. For example, a 
person with multiple sclerosis who relapsed 
364 days after treatment is a non-responder; 
one who relapses 365 days after treatment is 
a responder. This is simplistic — it recasts 
differences of degree as differences of kind. 
Worse, it causes an unfortunate loss of infor- 
mation, and means that clinical trials must 
enrol more participants than would otherwise 
be needed to reach a sound conclusion”*. 


Participants’ variability. Physiology 
fluctuates. Trial participants are often 
labelled as responders after one meas- 
urement, post-treatment, with the tacit 
assumption that the same treatment in the 
same person on another occasion would 
yield the same observation. But repeated 
observations of the same person with a dis- 
ease such as asthma or high blood pressure 
show that the result after treatment can vary. 


Inappropriate yardsticks. Judging whether 
a drug works depends on making assump- 
tions about what would have happened 
without the treatment — a counterfactual. 
One common technique for estimating the 
counterfactual is to take baseline measure- 
ments; for instance, the volume of air that 
people with asthma can force from their 
lungs in one second at the start ofa trial. But 
baselines are a poor choice of counterfactual. 
Guidelines agreed by drug regulators in the 
European Union, Japan and the United 
States disparage their use as controls. 

There are many reasons besides treatment 
— such as regression to the mean or varia- 
tion in clinical settings — that might explain 
a difference from baseline, especially if meas- 
urements such as elevated blood pressure or 
reduced lung capacity are used to determine 
who can enrol in a clinical trial. Let’s say 
Patient X was enrolled in a trial after meet- 
ing the criteria for having a blood-pressure 
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TWO TESTS: DOES THE DRUG WORK BETTER FOR SOME? 


Many comparisons can show whether individuals respond in 
the same way to a drug each time. 


DID A PATIENT RESPOND TO TREATMENT? 
‘0 both comparisons 


x Yes, but just in one 


O No 


Comparison 2 
0 05° 1 


+ Cut-off for 
response 


+ 


ailo 


CONSISTENT RESPONSE 
There is potential for 


INCONSISTENT RESPONSE 
Futile to try to pinpoint 
a subset of responders. 


red therapy. 


measurement of more than 130/90 mm Hg. 
She is given a drug, after which her blood 
pressure measures 120/80 mm Hg. One pos- 
sibility is that the drug affected her blood 
pressure. Another is that 125/85 mm Hg 
(or some other intermediate value) is her 
mean blood pressure, and that she hada bad 
day on enrolment and a good day later. Yet 
another possibility is that her blood pressure 
was measured at different times of the day, at 
different places or by different people. 

For measurements such as pain scores and 
cholesterol levels, predictions for individuals 
— based on an average of all participants — 
can be more accurate than predictions based 
on an individual’s own data taken just once’. 


Rates of response. Suppose that in a large trial 
for an antidepressant, 30% of patients have a 
satisfactory outcome in terms of their score 
on the Hamilton Depression Rating Scale 
after taking a placebo, and 50% show a satis- 
factory outcome after taking the drug. This 
means that the probability ofa good outcome 
observed with the drug is 20% higher than 
with the placebo. Or put another way, on aver- 
age, if five patients were treated with the drug, 
one more would experience a satisfactory 
outcome. This statistic is an example of what 
is called the ‘number needed to treat’ (NNT). 

This concept was introduced 30 years 
ago® and is extremely popular in evidence- 
based medicine and assessments of health 
technology. Unfortunately, NNTs are often 
falsely interpreted. Consider a trial compar- 
ing paracetamol to a placebo for treating ten- 
sion headache. After 2 hours, 50% of people 
treated with the placebo are pain-free, as are 
60% of those who were treated with para- 
cetamol. The difference is 10% and the NNT 
is 10. However, if paracetamol works for 100% 
of participants in 60% of the times they are 
treated, it will give the same NNT as if it works 
for 60% of the participants 100% of the time. 


S. SENN 


A high NNT should not be taken to imply 
that a drug works really well for a specific, 
narrow subset of people. It could simply 
mean that a drug is just not that effective 
across all individuals. 


Subsequence, not consequence. All of the 
errors discussed so far lead to the assumption 
that what has happened, for good or ill, has 
been caused by what was done before — that 
ifa headache disappeared, it was because of 
the drug. It is ironic that the evidence-based- 
medicine movement, which has done so 
much to enthrone the randomized clini- 
cal trial as a principled and cautious way of 
establishing causation across populations, 
consistently fails to establish causation in the 
context of personalized medicine. 


WAY FORWARD 

These warnings are not intended to 
discourage researchers from pursuing pre- 
cision medicine. Rather, they are meant to 
encourage them to get a better sense of its 
potential at the outset. 

How to improve? One thing we need more 
of are N-of-1 trials. These studies repeatedly 
test multiple treatments in the same person, 
including the same treatment multiple times 
(see ‘Compare each patient at least twice’). 

With such designs, we can assess 
differences between the same drug being 
administered on many occasions, and com- 
pare those data with differences seen when 
different drugs are administered in the 
same way. They are being used, for example, 


in trials of fentanyl for pain control in 
individuals with cancer’ and of temazepam 
for people with sleep disturbances”. 

When medicines are given on many 
occasions for a chronic or recurring con- 
dition, N-of-1 studies are a good way of 
establishing the scope for personalized 
medicine’’. When drugs are given once or 
infrequently for degenerative or fatal condi- 
tions, careful modelling of repeated meas- 
ures can help. Whatever their approach, 
trial designers must hunt down sources of 
variation. To work out how much of the 
change observed 


is due to variability “Whatever 
within individu- their approach, 
als requires more trial designers 
careful design and must hunt down 
analysis”. sources of 
Anotheradvance variation.” 


would be to drop 

the use of dichotomies’. Statistical analysis 
of continuous measurements is straight- 
forward but underused. More-widespread 
uptake of this approach would mean that 
clinical trials could enrol fewer patients and 
still collect more information’. 

Perhaps the most straightforward 
adjustment would be to avoid labels such 
as ‘responder’ that encourage researchers 
to put trial participants in arbitrary 
categories. An alternative term — perhaps 
‘clinical improvement or ‘satisfactory end- 
point’ — might help. Better still, sticking 
with the actual measurement would reduce 
the peril of all the pitfalls mentioned here. 


It has been a long, hard struggle in 
medicine to convince researchers, regulators 
and patients that causality is hard to study 
and difficult to prove. We are in danger of 
forgetting at the level of the individual what 
we have learnt at the level of the population. 
Realizing that the scope for personalized 
medicine might be smaller than we have 
assumed over the past 20 years will help us 
to concentrate our resources more carefully. 
Ironically, this could also help us to achieve 
our goals. m 


Stephen Senn was formerly head of the 
Competence Center for Methodology and 
Statistics at the Luxembourg Institute of 
Health. 

e-mail: stephen@senns.demon.co.uk 
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COND 


Why academic freedom is 
needed more than ever 


For acentury, the Haldane principle has enabled government scientists to speak 
truth to power without fear of retribution — cherish it, urges Ehsan Masood. 


ne hundred years ago this month, 

shortly after the guns of the First 

World War fell silent, a German- 
speaking Scottish lawyer-turned-politician 
sent an 80-page report to his prime minister. 
In it was an idea whose echo still shapes the 
way in which many nations fund research 
— an idea arguably as important to the soul 
of modern science as the secular state is to 
modern democracy. 

That idea has come to be called the Haldane 
principle, after its proponent, Richard Burdon 
Haldane. This principle says that scientists 
should mostly be left alone to decide which 
research projects should receive government 


funding'®. (It is not to be confused with the 
rule about speciation, formulated by evolu- 
tionary biologist J. B. S. Haldane.) In many 
nations, the Haldane principle is near-totemic 
— regarded as the scholar’s last defence 
against more powerful interests. 

But the definition used today does not 
reflect the depth of vision in the original. 
Haldane argued in his 1918 report’ that 
politicians need to do more than stay out 
of funding decisions. He urged them to lis- 
ten to expertise, and to take time to think 
and reflect before reaching a conclusion. 
And he wrote that politicians who ask 
scientists for advice should resist telling 


them what that advice should be. 

The difference matters. Today, from 
Istanbul to Islamabad, from Rome to Rio de 
Janeiro, a parade of authoritarian leaders is 
advancing policies that fly in the face of evi- 
dence — on energy, emissions, the environ- 
ment, economics, immigration and more. 
Worse, these leaders are demanding that 
academics march to the beat of their drums. 

Even in seemingly healthy democracies, 
the direction of travel is unmistakable. In 
the United Kingdom last year, a ‘Haldane 
principle’ was passed into law for the first 
time — but as part of a package of measures 
that saw universities lose the protection > 
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> ofthe royal charters that have enshrined 
scholarly autonomy for centuries’. 

Today, it is researchers who are demand- 
ing protection for their ability to speak 
truth to power. What is remarkable about 
Haldane’s incarnation of this idea is that it 
originated, not from scientists, but from the 
heart of government in the darkest of times. 


ANEW WORLD 

Britain in 1918 was a different place. The 
small island nation controlled about one-fifth 
of the world’s population and one-quarter of 
global land area from Canada to New Zea- 
land, including many nations in Africa, Asia 
and the Middle East. 

The First World War had been a wake-up 
call. In 1917, Haldane was asked by the UK 
government to chair the Orwellian-sounding 
Machinery of Government Committee. Its 
remit was to re-engineer Britain's ministries 
to cope with peacetime, the rising powers of 
Germany and the 


United States, and “A politician 

the citizens of vari- argued 

ous colonial states, persuasively 

who wouldsoonbe for acheck on 
demanding free- the power of the 
dom. The seven- yery corridors he 
membercommittee waljeed.” 


included Edwin 

Montagu, secretary of state for India, and 
the economist and social reformer Beatrice 
Webb, a co-founder of the London School of 
Economics and Political Science. 

Haldane was an unusual politician but 
the ideal chairperson for this task. He had 
top-level experience of government as Lord 
Chancellor, the person responsible for the 
country’s judiciary. And he was available: 
hed lost the post of minister for war in 1915 
after a campaign in the popular press painted 
him as a German sympathizer. He had praised 
the organization of education and science in 
Germany, noting how its culture, industry, 
research and policymaking came together. 
For this, the newspapers called him an enemy 
of the British state. He was doorstepped by 
journalists and insulted on the street’. 

Haldane could just as easily have worked in 
academia. He studied philosophy at the Uni- 
versity of Géttingen in Germany and wrote 
several books on the topic, including The 
Reign of Relativity in 1921, about the implica- 
tions of Albert Einstein's physics. But his heart 
lay in public policy. He was an early advocate 
of expanding education and, after he left the 
government, helped to create a wave of civic 
universities in cities including Birmingham, 
Bristol, Leeds, Liverpool and London’. 


THINK THEN DO 

The Haldane report's key recommendations 
included something that we take for granted 
today: cabinet-level ministries for health and 
education. In another innovation that is also 
now mainstream, Haldane advised that these 


ministries would need access to the best 
available advice. For example, an education 
ministry would need counsel from experts in 
childhood development, and a health min- 
istry would need guidance from scientists 
working on infectious diseases. His ideal 
ministers were people whose time was freed 
from operational matters to be able to think 
and plan. 

The most radical suggestion in the report 
was for an entirely new ministry of “research 
and information” Haldane dared to suggest 
that its leader should be nota party politician 
(the convention then, as now), but “essen- 
tially a trained thinker”. 

The report envisaged this ministry as 
a blend of government think tank and 
research funder. It urged that “better provi- 
sion should be made for enquiry, research 
and reflection before policy is defined and 
put into operation’. 

The historian David Edgerton has rightly 
pointed out that the original report does 
not mention a ‘Haldane principle’ (see 
go.nature.com/2qybjbn). So where did the 
moniker arise? In an unpublished memo 
written probably in February or March of 
1918, six months into the inquiry, Haldane 
mentions three “principles” for reorganizing 
government’. The first — a “new principle 
to be recognised as fundamental” — is for 
government and policymakers to develop “a 
habit of mind, a disposition to insist on the 
systematic study of questions before [policy] 
action is taken” (The other two focused on 
the rationale for different ministerial jobs 
and better financial accountability from 
government departments.) The memo’s 
tone is much more direct than that of the 
final report, suggesting that its intended 


audience was probably Prime Minister 
David Lloyd George. 

In words that ring true today, Haldane 
adds: “A Prime Minister is chosen as the 
leader of the nation largely because of his gifts 
as its spokesman ... But he has to shape pol- 
icy, and to this end requires the most highly 
skilled assistance, ifhe is not to be a bungler.” 

Despite this progressive thinking, there 
is no sugar-coating the fact that Haldane 
was an imperialist®. The needs of the British 
Empire were a strong factor in his calcula- 
tions for science in government. There were 
railways to be built, botanical and geological 
surveys to be done, new languages and legal 
systems to be mastered — and catastrophic 
famines and outbreaks to be tackled, notably 
in India. All of this demanded engineers and 
scientists’. 

Haldane’s wish for an over-arching 
ministerial research department never 
materialized. It is a brave government that 
would prioritize study, thought and reflection 
in the making of policy. But traces of the Hal- 
dane ideal can be seen in what was to follow. 

His ideas are reflected today in the work 
of the scientists attached to the ministries 
dedicated to science, technology, innova- 
tion and higher education. These are largely 
responsible for organizing and funding 
teaching and research in universities and in 
public laboratories. They also seek expert 
counsel. In a few countries — notably 
Germany and the United Kingdom — they 
are also involved in industrial policy. 

The ideal of independence also informs 
the work of chief science advisers, whose 
offices might be attached to those of heads of 
government or to departments from food to 
forestry, transport to trade. Since 2014, they 


Students in Budapest protest in if, 2017 over government interterened in universities. 
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Richard Burdon Haldane advised in 1918 that governments need access to the best expert advice. 


have been part of the International Network 
for Government Science Advice (INGSA), 
created to hone practice. The difference is 
that Haldane wished for such expertise to 
operate closer to the apex of government, 
and to be accountable to Parliament. 


INDEPENDENCE DAY 

When Haldane’ report landed on the prime 
minister’s desk, it had little impact: the end 
of the First World War was a busy time for 
statecraft. There were peace treaties to be 
agreed and a domestic economy to be stead- 
ied. The Ottoman empire was collapsing, 
and Britain and France were competing for 
influence in its former territories. 

It was in the years during and after the 
Second World War that Haldane’s idea of 
independent advice resurfaced. Scientists 
and engineers from many countries had 
created the technologies that were crucial 
to the Allies’ victory, such as radar and the 
atomic bomb. These needed a degree of 
operational distance from politicians — a 
hard-won achievement, as writer C. P. Snow 
describes entertainingly in his 1961 book Sci- 
ence and Government (Harvard Univ. Press; 
see also J. Baker Nature 459, 36-39; 2009). 

US scientists who had held prominent 
policy roles during the Second World War 
— such as Vannevar Bush — spied an oppor- 
tunity. Bush’s post-war report, Science: The 
Endless Frontier, was an appeal to US leaders 
that if scientists could help to win the war, 
they could also help to hold the peace”. Bush 


noted that they would need federal funding 
and, crucially, would require politicians to 
stay at arms’ length. 

And so it proved. Year after year, when 
governments respected the independence 
of the scientists they tapped for advice, the 
results were genuinely world-changing. 
Examples include the first generation of sci- 
entists who created Green Revolution agri- 
cultural technologies in the 1950s and 1960s, 
and the researchers whose findings led to 
the Montreal Protocol to protect the ozone 
layer in 1987. The Kyoto climate protocol of 
1997 was a direct result of the efforts of the 
Intergovernmental Panel on Climate Change, 
whose members, although nominated by 
governments, fight hard to work without their 
paymasters peeping through the keyhole". 

But Haldane’s world of the honest broker 
starts to break down when governments stop 
keeping their side of the bargain. 


BEWARE BUNGLERS 

That is what is happening now, as an expand- 
ing network of populist political move- 
ments derides independent scholarship. For 
instance, Britain’s staunchest supporters of 
the campaign to leave the European Union 
(‘Brexiteers’) disdained expert warnings of 
the economic and environmental costs. In his 
election campaign, Brazil's new president, Jair 
Bolsonaro, pledged to roll back the country’s 
historical commitments on deforestation and 
climate change. And last month, Michael 
Ignatieff, rector of the Central European 


University in Budapest, announced that 
the university will be relocating to Vienna 
because of sustained interference in its opera- 
tions by Hungary’s right-wing government. 

Meanwhile, some scientists are so 
concerned by the ransacking of the US 
Environmental Protection Agency (EPA) 
by President Donald Trump’s White House 
that they have reportedly set up a shadow 
EPA in preparation for the next administra- 
tion, so that valuable knowledge isn't lost. 
And in Australia, former education minister 
Simon Birmingham was unapologetic when 
it emerged that he had vetoed 11 grants worth 
Aus$4.2 million (US$3 million) that had 
been cleared for funding by the Australian 
Research Council. 

There are other examples, and there will 
be more as populism strengthens its grip on 
those who suffered as a result of the 2008 
financial crisis. And that is what makes the 
original Haldane report a remarkable docu- 
ment, worth recalling now. With national 
security under threat, Haldane’s committee 
could have demanded fealty from scientists 
and engineers. It could have insisted on ideo- 
logical litmus tests. It did no such thing. 

Today, more than ever, the authentic Hal- 
dane principle — and its origin story — must 
be cherished. In a world laid waste by war, a 
politician argued persuasively for a check on 
the power of the very corridors he walked. 
Haldane died in 1928, having no inkling that 
his Machinery of Government report would 
be talked of a century later. Its lasting legacy 
is the insight that the truth, often expendable 
in politics, must not be so in science advice. = 
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Which book will be a bestseller? Sales are often driven by word of mouth rather than quality. 


SOCIAL NETWORKS 


The secrets of status 


There are laws underlying success and fame. Mark Buchanan savours a study of them. 


few years ago, physicist César 
Ane and his team devised a way 

to rank the most famous people of 
all time. The criterion used for the Pantheon 
project was the number of languages a per- 
son's Wikipedia pages appear in. The most 
famous musician? Jimi Hendrix. The most 
famous American? Martin Luther King 
Jr. Perhaps inevitably, the classical Greek 
philosopher Aristotle heads the entire list. 

Among celebrities, reality-television 
star Kim Kardashian comes 14th, although 
her fame clearly outweighs any recorded 
achievement. Such paradoxical mismatches 
of fame and attainments, as physicist and 
network scientist Albert-Lazlé Barabasi 
writes in The Formula, reflect deep social 
laws that can be understood through 
science. Success and recognition in many 
realms have only a tenuous link to effort, 
skill or inherent excellence. Often, they are 
determined by less obvious factors of human 
behaviour that influence how attention flows 
through social networks. 

The Formula is a fun, fast, first-hand 
account of efforts to use big data to pull 
back the curtain on our collective dynam- 
ics. As Barabasi shows, hidden statistical 
and multiplicative processes have huge con- 
sequences in the human world, yet often 
operate outside our general awareness. 
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_——— Many of the effects 
described wouldn't be 
sb surprising in a system 
| of inanimate matter, 
|| such as cracks growing 
|| irregularly in a brittle 
material. It would just 
| be ordinary statisti- 
) cal physics. But these 
unstable cascades are 
eye-opening when 
documented in sys- 


The Formula: The 
Universal Laws of 


Success : 

APRERT RAO tems of free-willed 
BARABASI people, where they 
Little Brown (2018) have major impacts 


on people's lives. They 
might propel one individual vastly beyond 
another of identical skill, or drive rapid, 
unpredictable transformations of social 
norms such as smoking. 

Barabasi describes how exploiting big 
data collected from the web, including 
social media and other digital repositories, 
is helping researchers to tease out how suc- 
cess and performance actually relate to one 
another. If one song is more popular, or 
one person more wealthy, is it because of 
inherent differences, or merely the result 
of luck and random amplification? Even 
20 years ago, such questions were subject 
only to ideological debate, not scientific 
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exploration. Science has changed the game. 

Barabasi considers puzzles in music, 
science, wealth, sports and wine. In some 
arenas, such as competitive tennis, skill and 
prowess are decisive. In others — including 
whether a book is pulped or becomes a best- 
seller — quality seems to be overwhelmed by 
network effects, such as the tendency to flock 
towards books that have already sold well. An 
intriguing example is German pilot Manfred 
von Richthofen (the “Red Baron’). Remem- 
bered as one of the First World War's top fly- 
ing aces, he was objectively outperformed on 
many counts by an almost-forgotten French 
counterpart, René Fonck. Von Richthofen’s 
lasting celebrity arose, Barabasi shows, in 
part from his early death as a war hero, which 
made him useful for propaganda. 

How it works in general, Barabasi 
suggests, is now becoming clear owing to 
the emergence from research of a number 
of simple “laws of success”. 

The first is that “performance drives 
success, but when performance can’t be 
measured, networks drive success”. With 
competitive tennis, better athletes win repeat- 
edly, showing superiority. But when judging 
wine, it’s not easy to find objective means of 
ranking: repeated blind tastings, even among 
wine experts, lead to wildly fluctuating out- 
comes. When quality is hard to measure, 
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observed differences in success — judged 
by popularity or sales, for example — follow 
from network effects. People rush to buy an 
early leader, swayed by the mistaken belief 
that others’ choices tell them about standard. 

This results in huge differences in outcome 
that have nothing at all to do with quality. 
That phenomenon is the subject of the second 
law: “Performance is bounded, but success is 
unbounded.’ Take the top 100 wines entered 
into a competition. Their true differences in 
quality, for example in clarity or varietal char- 
acter, are generally small: they're all produced 
by top winemakers using similar technology. 
Yet one wine, because of the amplifying power 
of social networks, might enjoy orders of 
magnitude more sales than others. 

Social scientists have known about such 
effects for decades, although research by 
many, including Barabasi and his students, 
has brought them into much clearer focus. 
The Formula also covers nuanced stud- 
ies showing how success can be predicted. 
So the third law that Barabasi describes is: 
“Previous success x fitness = future success.” 
Careful studies — for example, by network 
scientist Manuel Cebrian or complexity 
scholar Dashun Wang — have found that it’s 
possible to identify how much ofa product's 
popularity depends on its quality or fitness, 
and how much can be traced to random 
amplification by network effects. Detailed 
data on consumer ratings and sales over time 
gathered by a site such as Amazon.com can 
be used to disentangle herding effects (those 
tending to push the already popular towards 
further popularity) from real consumer 
preferences based on true perceived quality. 
This understanding can be used to forecast 
trends, but also to boost sales or consumer 
satisfaction. 

And then there's the fourth law: “While 
team success requires diversity and balance, 
a single individual will receive credit for the 
group’s achievements.” Analyses of highly 
successful teams in science or business show 
that which individuals get the most credit 
has nothing to do with who actually did the 
work. Credit is based on perception, and 
is a collective social phenomenon. Effec- 
tive teams absolutely require diversity, but 
society singles out lone individuals for the 
accolades. 

Altogether, The Formula offers a rich tour 
of research on how relatively simple feedback 
forces channel our lives in surprising and 
counter-intuitive ways. We might think that 
success ought to be determined by a person's 
skill and hard work. Yet, more important is 
how other people respond, by interacting in 
complex social networks. Even individual 
success is a thoroughly social matter. m 


Mark Buchanan has written many books 
about network effects. He is based in 
Abergavenny, UK. 

e-mail: buchanan.mark@gmail.com 
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Books in brief 


Quantum Space 

Jim Baggott OXFORD UNIV. PRESS (2018) 

Prolific physics writer Jim Baggott is back with a terrific page-turner 
on loop quantum gravity (LQG) — the theory posited as a solution 

to that chasm in physics between quantum mechanics and the 
general theory of relativity. Baggott digs into the how and why of 

what LQG might reveal about “space, time and the universe”, tracing 
its evolution through the work of Abhay Ashtekar, Lee Smolin, 

Carlo Rovelli and others, to its current implications for, say, the physics 
of black holes. Baggott masterfully tenderizes the scientific chewiness 
and is careful not to over-egg what is, after all, a work in progress. 


The Republican Reversal 

James Morton Turner and Andrew C. Isenberg HARVARD UNIV. PRESS (2018) 
In the 1960s and 1970s, the US Republican Party — pressured by 
the era’s environmental movement — created the Environmental 
Protection Agency (EPA) and extended the Clean Air Act. Today, 

it busily eviscerates the EPA while denouncing climate change 

as a hoax. Environmental historians James Turner and Andrew 
Isenberg follow this reversal from Ronald Reagan’s presidency 

on, revealing how conservative ideologues hostile to science and 
bent on deregulation have gradually bolted US exceptionalism to 
anti-environmentalism. Searingly timely and cautiously hopeful. 


Gods and Robots 

Adrienne Mayor PRINCETON UNIV. PRESS (2018) 

More than two millennia before today’s explosion in robot 
manufacture, bards and philosophers toyed with the concept 

of imitating life. Classics scholar Adrienne Mayor's astonishing 
chronicle harks back to mythic automata, such as “evil fembot” 
Pandora and bronze giant Talos. And she examines real mechanical 
devices — flying doves, bellowing statues and gliding Buddhas — 
devised by virtuosic technicians from the Mediterranean to China. A 
third-century BC colossus crafted for Egyptian monarch Ptolemy II 
Philadelphus, for instance, could stand up, sit down and pour milk. 


Dark Commerce 

Louise |. Shelley PRINCETON UNIV. PRESS (2018) 

Illicit trade in human organs, wildlife, arms and rare woods has 
vastly expanded over the past three decades as communications 
and digitization have improved apace. Here, Louise Shelley, a 
leading researcher in the field, examines organized crime over four 
millennia. She unpeels its disturbing dynamics today through case 
studies such as Silk Road, a vastly lucrative cybersupermarket, and 
the much-documented illegal market in rhino horn (currently priced 
at US$60,000 per kilo). And she lucidly lays out the dark economy’s 
planetary costs, as it escalates biodiversity loss and deforestation. 


Atlas of Poetic Botany 

Francis Hallé with Eliane Patriarca, transl. Erik Butler MIT PRESS (2018) 
From the epiphytic ‘hanging’ plant Guzmania lingulata to the 
mushroom mimic Helosis cayennensis, compelling oddities crowd 
equatorial forests. Botanist Francis Hallé celebrates their spectacular 
weirdness in this sprightly homage, translated from French by 

Erik Butler. Alongside descriptions of clonal forests, underground trees 
and ‘dancing’ plants, Hallé sets playful stylized drawings explicating 
the strange behaviours, adaptations and coevolution of each species. 
It’s a vegetal parade that reminds us, yet again, how some chunks of 
Earth’s biosphere still smack of terra incognita. Barbara Kiser 
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Heat and soil 
vie for waste 


The Renewable Heat Incentive 
scheme in the United Kingdom 
has led some industries to turn 
much of their organic waste 
into feedstocks for energy 
production. Previously, this 
material — such as biosolids 
from water processing — was 
used to replenish soil carbon. 
This switch might inadvertently 
undermine the global initiative 
to increase soil carbon to 
mitigate climate change, which 
was ratified in 2015. 

This potential conflict 
needs to be part of the policy 
discourse and research 
agenda for energy and the 
environment. Reductions 
in organic carbon hamper 
soil functions, such as 
water-holding capacity, and 
reduce crop productivity. 

But knowledge gaps remain 
about the necessary level, 
composition and quality of 
carbon required to maintain 
and improve soil health and 
fertility. 

The products of anaerobic 
digestion of organic wastes for 
heat production are widely used 
as soil conditioners. But they 
have a lower ratio of carbon to 
nitrogen than do conventional 
composted vegetation and food 
waste. It is not known just how 
low the ratio of such wastes can 
go before it becomes worthless 
to return them to the soil. 
Karen L. Johnson* Durham 
University, UK. *On behalf of 4 
correspondents (see go.nature. 
com/2dxkpus for full list). 


karen.johnson@durham.ac.uk 


Educate public about 
gene-edited crops 


Crop varieties created by gene 
editing could benefit farmers 
in developing countries by 
providing bigger yields with 
better nutrition and greater 
tolerance to stress. But the 
public’s suspicion and fear 
impede the application of 
plant biotechnology in regions 


where it would be most useful. 
International outreach efforts 
are gearing up to increase 
public understanding of the 
scientific principles behind 

the technology. This will help 
governments to make informed 
decisions about gene-edited 
crops. 

For example, secondary- 
school programmes run by 
universities in Malaysia and 
Ghana are educating the 
farmers, researchers and 
leaders of the future. Uganda's 
Biosciences Information Centre 
targets smallholders. And in 
the United States, Iowa State 
University’s Plant Breeding 
Education in Africa Programme 
provides free e-learning courses 
to universities in sub-Saharan 
Africa on the application of 
biotechnology and genomics in 
plant breeding. 

To increase this type of 
outreach, governments and 
development organizations need 
to invest in universities and 
secondary-school teachers, and 
provide them with the necessary 
resources. 

Walter P. Suza* Iowa State 
University, Ames, Iowa, USA. 
*On behalf of 5 correspondents 
(see go.nature.com/2qhy2cb for 
fullilist) 


wpsuza@iastate.edu 


Rat and bat hunt 
helps heal old rift 


In the British Solomon Islands 
Protectorate in 1927, a warrior 
named Basiana led the Kwaio 
resistance against colonial 
rule of the island of Malaita, in 
which 15 people — including 
an Australian and a Briton — 
were killed with spears and a 
few rifles. The London Colonial 
Office asked Australia to quell 
the ‘uprising’ In the months 
that followed, Australians and 
Solomon Islanders killed at 
least 60 Kwaio, desecrating 
shrines and violating cultural 
taboos. Eventually, Basiana 
surrendered and was hanged 
with six conspirators. For 
almost a century, these events 


626 | NATURE | VOL 563 | 29 NOVEMBER 2018 
© 2018 Springer Nature Limited. All rights reserved. 


have held back the Kwaio 
people, shaping their relations 
with ‘Europeans. 

In 2015, however, the 
Kwainaaisi Cultural Centre 
and the Australian Museum 
began a collaboration in 
East Kwaio to search for two 
undescribed mammals — a 
giant rat called kwete (probably 
Uromys or Solomys), and a 
monkey-faced bat (Pteralopex). 
When unresolved tensions 
were threatening the safety of 
the personnel involved, Kwaio 
leadership saw the relationship 
developed with the Australian 
Museum as an opportunity for 
reconciliation. 

In July, we and other 
representatives and 
descendants of tribes and 
Australians affected in 
1927 met in the mountains 
of Malaita for traditional 
ceremonies, exchanging pigs 
and shell money to resolve the 
dispute. The watershed event 
has established us as genuine 
partners and is a beginning 
to peace among Kwaio tribes, 
Malaita, the Solomon Islands 
and ultimately with Britain. 

Our experience shows that 
all parties can benefit from 
biodiversity surveys if they 
respect local cultural processes 
and are built on mutual 
collaboration. 

Esau Kekeubata* Kwainaaisi 
Cultural Centre, Kwainaa’isi, East 
Kwaio, Malaita, Solomon Islands. 
Tyrone Lavery* Biodiversity 
Institute, University of Kansas, 
Lawrence, Kansas, USA. 

*On behalf of 6 correspondents 
(see go.nature.com/2zsd2x4 for 
fullilist). 


tlavery@fieldmuseum.org 


Reproducibility in 
public and private 


The reproducibility crisis in 
biomedical science seems to 
have alarmed industry more 
than the academic community 
(see C. G. Begley and L. M. 
Ellis Nature 483, 531-533; 
2012). In our view, this is 
because they have different 


yardsticks for success in 
research. 

Despite the advent of 
important new therapeutics, 
the number of innovative 
treatments reaching the patient 
is disappointingly low. To 
help rectify this, industry is 
investing in drug-discovery 
alliances with peers and 
academic groups, and in 
precision medicine. It sees 
high standards of research 
quality as the route to the most 
promising drug candidates 
and to maximum return on 
investment. 

By contrast, academic 
scientists may be reluctant to 
devote extra time and effort to 
confirming research results in 
case they fail. That would put 
paid to publication in high- 
impact journals, damage career 
opportunities and curtail 
further funding. Evidence of 
questionable practices such 
as selective publishing and 
cherry-picking of data indicates 
that rigour is not always a high 
priority. 

Paradoxically, the impact 
of high standards on research 
objectives is different in 
industry and in academia. If 
ignored, this paradox could 
endanger future collaborations 
between scientists in the 
private and public sectors. 
Anton Bespalov* Heidelberg, 
Germany. 

Adrian G. Barnett Queensland 
University of Technology, 
Brisbane, Australia. 

C. Glenn Begley* BioCurate, 
Melbourne, Australia. 
*Competing interests declared (see 
go.nature.com/2retftw for details). 
anton. bespalov@paasp.net 
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Osamu Shimomura 


(1928-2018) 


Chemist who illuminated bioluminescence. 


rowing up during one of 
‘ex darkest times in history, 

Osamu Shimomura devoted 
his long and fruitful career to under- 
standing how creatures emit light. 
He discovered green fluorescent pro- 
tein (GFP), with which — decades 
later — biomedical researchers 
began to monitor the workings of 
proteins in living tissue, and to con- 
firm the insertion of genes. For that 
discovery, he shared the Nobel Prize 
in Chemistry in 2008 with neuro- 
biologist Martin Chalfie and the late 
Roger Tsien, a chemist. 

Shimomura, who died in 
Nagasaki, Japan, on 19 October, was 
the first to show that a protein could 
contain the light-emitting appara- 
tus within its own peptide chain, rather than 
interacting with a separate light-emitting 
compound. The significance of this discov- 
ery was that the gene encoding GFP could, 
in principle, be copied (or ‘cloned’) and used 
as a tool in other organisms. Others eventu- 
ally took that step, but it would have been 
impossible without the exemplary patience 
of Shimomura, who spent years gathering 
enough material to extract, purify and deter- 
mine the chemical structure of GFP. 

Born on 27 August 1928 in the town of 
Fukuchiyama, at the height of Japanese 
expansionism, Shimomura was the son of 
an army captain whose frequent postings 
abroad disrupted his child’s school educa- 
tion. Shimomura’s grandmother instilled in 
him the samurai principles of honour and 
fortitude. In 1944, with the Pacific War turn- 
ing against Japan, he and his fellow school 
students were mobilized to work in a muni- 
tions factory in Isahaya, about 25 kilometres 
from Nagasaki. On 9 August 1945, he was 
at work when a blinding flash, followed by 
a huge pressure wave, signalled the drop- 
ping of the atomic bomb on the nearby city. 
He walked home under a shower of black 
rain. He later wrote that his grandmother's 
quick action in putting him straight in the 
bath might have saved him from the effects 
of the radiation. 

Without a high-school diploma, he 
despaired of finding a college place. Eventu- 
ally, Nagasaki Pharmacy College admitted 
him in 1948. On graduation, he worked for 
four years as an assistant in practical classes. 
He devised research projects in his own time, 
and his professor obtained permission for 
him to doa year of advanced study. He joined 


the laboratory of organic chemist Yoshimasa 
Hirata at Nagoya University, and his lifelong 
fascination with bioluminescence began. 

Hirata asked him to extract and purify 
a compound, luciferin, which enables the 
tiny marine crustacean Cypridina to glow 
in the dark. Hirata thought the results too 
uncertain for a PhD student, but because 
Shimomura was not registered for a degree, 
he allowed him to try. In just ten months, 
Shimomura made pure crystals of luciferin 
(O. Shimomura et al. Bull. Chem. Soc. Japan 
30, 929-933; 1957). “I learned that any dif- 
ficult problem can be solved by great effort,” 
he wrote in his Nobel biography. 

The luciferin paper brought an invitation 
for Shimomura to join the bioluminescence 
lab of biologist Frank Johnson at Princeton 
University in New Jersey. Three weeks after 
marrying Akemi Okubo in August 1960, 
Shimomura sailed to the United States, his 
travel paid for by a Fulbright scholarship. 
Johnson asked him to work on the jellyfish 
Aequorea, which has a ring of organs around 
the edge of its umbrella that emit blue light. 
In July 1961, Johnson, Shimomura and his 
wife, and several assistants and students 
made a road trip across the United States to 
collect hundreds of jellyfish, scooping them 
out of Friday Harbor in Washington state, 
cutting off the rings and transporting them 
to Princeton for analysis. 

In the face of scepticism from Johnson 
and others, Shimomura determined that 
the luminescent substance was a protein; he 
named it aequorin. He discovered almost 
at once that it was activated by calcium 
(later, aequorin became an essential reagent 
as a glowing marker of calcium release). 


Shimomura, his family and his 
research colleagues spent 19 sum- 
mers at Friday Harbor, collecting 
hundreds of thousands of jellyfish to 
obtain enough of the elusive material 
for a full structural analysis. Until a 
way of making genetically engi- 
neered aequorin became available in 
the 1990s, Shimomura freely shared 
his carefully harvested stocks with 
laboratories the world over. 

It was in the process of purifying 
aequorin that Shimomura discoy- 
ered small amounts of GFP, which 
fluoresces green when aequorin 
emits its blue light. It took him and 
his team until 1979 to accumulate 
enough to explore how the protein 
works. Shimomura described GFP’s 
unprecedented incorporation of the light- 
emitting function within the protein chain 
(O. Shimomura FEBS Lett. 104, 220-222; 
1979), and then put GFP aside to work on 
wide-ranging studies of bioluminescence in 
other organisms. 

In 1994, Chalfie’s group reported the suc- 
cessful creation of bacteria and roundworms 
that could express GFP (M. Chalfie et al. Sci- 
ence 263, 802-805; 1994). Soon afterwards, 
Tsien and his colleagues created GFPs of dif- 
ferent colours (R. Heim et al. Nature 373, 
663-664; 1995). Others have extended the 
technique to vertebrates, with headlines 
about ‘glow-in-the-dark monkeys’ obscur- 
ing the method's great value in confirming 
the successful incorporation of genes from 
other organisms. 

From 1982 until his retirement in 2001, 
Shimomura moved to the Woods Hole 
Marine Biological Laboratory in Massa- 
chusetts with Akemi, who had continued 
to work as his research assistant. After their 
retirement — a concept that was clearly 
difficult for them — they moved their 
laboratory into their home and continued 
to work. Shimomura’s textbook Biolumines- 
cence: Chemical Principles and Methods was 
released in 2006, and in 2017 he published an 
autobiography, Luminous Pursuit: Jellyfish, 
GFP, and the Unforeseen Path to the Nobel 
Prize. @ 


Georgina Ferry is a science writer 
specializing in the history of the life sciences 
in Oxford, UK. Her many books include 
biographies of the crystallographers Dorothy 
Hodgkin and Max Perutz. 

e-mail: mgf@georginaferry.com 
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Figure 1 | Strong winds at Port Foster, Deception Island, Antarctica. 


CLIMATE SCIENCE 


Warming linked to shifting winds 


During the most recent ice age, abrupt changes in the Arctic climate were transmitted through the ocean to Antarctica. An 
atmospheric link between the two hemispheres has now been identified across the Antarctic continent. SEE LETTER P.681 


NERILIE J. ABRAM 


r | Vhere are no precise analogues in Earth’s 
past for the rapid warming that the 
world is now facing as a result of ris- 

ing levels of greenhouse gases in the atmos- 

phere. However, some periods of Earth’s 
history can reveal valuable details of the way 
in which different parts of the climate system 
respond and interact’. Of particular interest 
are intervals of large and abrupt warming in 
the Arctic that occurred episodically during 
the most recent ice age (about 115,000-11,700 
years ago). These climate shifts are known as 

Dansgaard—Oeschger events, and saw temper- 

atures in Greenland jump by more than 10°C 

in a matter of decades”. On page 681, Buizert 
et al.’ report that these events altered the posi- 
tion of the westerly winds across the Southern 

Hemisphere — a finding that has implications 

for global ocean circulation and atmospheric 

carbon dioxide. 


Past rapid warming events in the Northern 
Hemisphere give researchers a way of address- 
ing fundamental questions in climate science. 
In particular, how do changes in the climate of 
one hemisphere affect that of the other? And 
how, and at what rate, do these changes propa- 
gate? The connection between hemispheres is 
important for determining how energy moves 
through the Earth system and alters the climate 
in different places around the globe. 

The simplest model to explain the relation- 
ships seen between Greenland and Antarctic 
temperatures during Dansgaard—Oeschger 
events is the ‘bipolar see-saw’ movement of 
heat between the hemispheres through the 
global ocean*. In this model, the Greenland 
temperature jumps abruptly into its warm 
phase when the overturning circulation — the 
sinking of surface waters to the deep ocean — 
in the North Atlantic Ocean speeds up. This 
adjustment in ocean circulation concentrates 
heat in the Northern Hemisphere and causes 
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Antarctica to gradually cool. It takes about 
200 years for the ocean changes in the North 
Atlantic to start affecting the Antarctic temper- 
ature’. This lag reflects the time that it takes for 
accumulated energy to penetrate north of the 
current that circles Antarctica and to begin to 
be absorbed into subsurface levels of the global 
ocean’. 

The opposite happens when the overturn- 
ing circulation in the North Atlantic slows or 
stops, causing Greenland to shift quickly into a 
cold state similar to that associated with an ice 
age. It again takes about 200 years from when 
this rapid change occurs in the Arctic to when 
Antarctica begins to warm. The longer the cold 
state persists in Greenland, the more Antarc- 
tica warms through this see-saw mechanism 
of the deep ocean’. 

But this is not the full story. The atmosphere 
also provides a means by which climate sig- 
nals propagate between the hemispheres, in 
a much faster way than through the ocean. 


MICHAEL NOLAN/ROBERT HARDING/GETTY 


Abrupt warming events in the Arctic pull the 
meteorological equator — a band of tropical 
storm clouds that circle the globe near the 
Equator — farther north, and, along with it, 
the rainfall patterns associated with the Asian 
summer monsoon. In 2017, a reinterpretation 
of water-isotope signals in an Antarctic ice 
core identified a near-instantaneous response 
of atmospheric circulation to changes in Arc- 
tic climate that occurred in the most recent 
ice age, all the way south to West Antarctica’. 
However, whether this response occurred 
throughout the Southern Hemisphere, or was 
more localized, remained unclear. 

Buizert and colleagues present the first 
Antarctic-wide evidence for a rapid atmos- 
pheric coupling of the position of the westerly 
winds around the whole of the Southern 
Ocean to past abrupt climate events in the Arc- 
tic (Fig. 1). Identifying these pervasive fluctua- 
tions in wind position, which happened on a 
decadal timescale tens of thousands of years 
ago, required the precise synchronization of 
ages for ice cores from across the Antarctic 
continent. 

Ice-core ages from Greenland have been 
linked to those from Antarctica using the 
methane composition of bubbles in the exceed- 
ingly well-resolved ice core from the West Ant- 
arctic Ice Sheet Divide’. Atmospheric methane 
is quickly mixed across the hemispheres, and 
so can be considered as globally synchronous. 
Past fluctuations in methane abundance mim- 
icked abrupt changes in Greenland tempera- 
ture and therefore provide a way of precisely 
interrogating the timing of climate events 
between the Arctic and the Antarctic. 

Buizert et al. took the next step in synchro- 
nizing the West Antarctic Ice Sheet Divide 
record with four other Antarctic ice cores by 
identifying characteristic sequences of vol- 
canic eruptions preserved in the sulfate levels 
in Antarctic ice. Only then were the authors 
able to identify the superimposed oceanic 
and atmospheric signals that occurred across 
Antarctica in response to past rapid changes 
in Arctic climate. 

The classic see-saw of heat between the 
hemispheres through the ocean can explain the 
delayed and gradual changes in Antarctic tem- 
perature that accompanied past abrupt shifts 
in Greenland temperature. But Buizert and 
co-workers’ study suggests that superimposed 
on these slow ocean changes were an almost 
synchronous northward shift in the westerly 
winds circling Antarctica when Greenland 
moved into its warm phase — and, vice versa, 
a southward shift in these winds during cool 
Greenland events. This atmospheric response 
modulated the latitude in the Southern Ocean 
that formed the source of the moisture that fell 
as snow over Antarctica. 

A one-to-one relationship has previously 
been identified between the duration of Green- 
land temperature events and the magnitude of 
the ensuing temperature response in Antarctica 
through the ocean mechanism’. Similarly, the 


authors find that the atmospheric response 
seems to scale so that stronger Greenland events 
result in a larger climatic signal in Antarctica 
and the Southern Ocean. An atmospheric link 
tying changes in Arctic climate to the Antarctic 
has previously been hypothesized on the basis 
of climate-model responses in experiments 
designed to mimic aspects of Dansgaard- 
Oeschger events®. The current work provides 
the observational data to prove the existence of 
this link. 

It is time to move beyond considering only 
the Atlantic Ocean and century-scale time lags 
when thinking about how the Arctic and the 
Antarctic are climatically connected’. Buizert 
and colleagues’ identification ofa rapid atmos- 
pheric link between climates at the poles has 
implications for our understanding of current 
climate change. Today, the Arctic is warming 
at about twice the rate of the global average; 
however, continent-scale warming of the Ant- 
arctic that is expected from climate simulations 
has not yet been clearly observed”. Changes 
in Antarctic sea ice are also not following 
expectations based on models'’. Meanwhile, 
the westerly winds of the Southern Hemi- 
sphere have been shifting rapidly southwards, 
affecting water security in cities such as Perth 
in Australia and Cape Town in South Africa, 
and potentially having global consequences by 
altering the movement of heat and carbon diox- 
ide between the atmosphere and the ocean". 
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Many challenges remain in accurately 
predicting how, and how quickly, the behav- 
iour of Antarctica and the Southern Ocean will 
change in a warming climate. Nevertheless, the 
authors have provided a glimpse of the natural 
changes in behaviour — both rapid and slow 
— that occurred tens of thousands of years ago. 
These results provide a basis for progress in 
unravelling the current scientific mysteries of 
how the ocean and the atmosphere at the poles 
respond to rapid changes in climate. = 
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A mosaic mutation 
mechanism in the brain 


Variable brain-specific mutations have been observed in Alzheimer’s disease. 
One mechanism underlying this mosaicism involves integration of variant gene 
copies back into the neuronal genome. SEE ARTICLE P.639 


GUOLIANG CHAI & JOSEPH G. GLEESON 


enetic mutations can arise not only 

in fertilized eggs, affecting all cells of 

an organism, but also in a subset of an 
organism's cells’ *. The latter phenomenon, 
called mosaicism, is prevalent in the brain, and 
has been associated with several neurological 
disorders, including sporadic Alzheimer’s dis- 
ease, the most common form of the disease’. 
In 2015, it was found’ that neurons from peo- 
ple with sporadic Alzheimer’s contained more 
DNA and had more copies of the Alzheimer- 
related gene amyloid-B precursor protein (APP) 
than did neurons from people without the 
disease. However, the exact genomic changes 
underlying this mosaicism remained unre- 
solved. Lee et al.° follow up on that work on 
page 639, providing a mechanism for increased 


APP mosaicism in the brains of people with 
sporadic Alzheimer’s disease. The study 
could alter our understanding of the roots of 
neurodegeneration. 

First, Lee et al. set out to analyse APP variants 
in neuronal messenger RNA. In each experi- 
ment, the authors used mRNA from just 
50 neurons from the brains of people with or 
without sporadic Alzheimer’s, because averag- 
ing across large neuronal populations could 
mask variants present in only a few cells. The 
researchers’ analysis revealed many APP 
mRNA variants. As expected, the variants 
lacked introns — non-protein-coding regions 
that are removed during gene transcription 
through a process called splicing, leaving only 
protein-coding exons. However, the variants 
were shorter than expected, and contained 
single-nucleotide mutations, inserted and 
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deleted exons, and larger deletions that led to 
the formation of new exon-exon junctions 
between missing multi-exon regions. Some of 
the mutations the authors observed have been 
previously implicated in familial Alzheimer’s 
disease’. 

Lee and colleagues found the same short 
variants when they analysed genomic DNA 
from the neurons, suggesting that APP-variant 
mRNAs might be transcribed from matching 
genomic DNA sequences — named genomic 
complementary DNAs (gencDNAs) by the 
authors — that had become permanently 
embedded in the genomes of neurons. 
To further validate the existence of APP 
gencDNAs in neurons, the authors used 
two independent approaches: a technique 
called DNA in situ hybridization (DISH), in 
which fluorescent molecules were bound to 
gencDNA-specific exon-exon junctions in 
DNA; and sequencing of short sections of 
APP DNA. Both approaches confirmed the 
existence of gencDNA variants. 

The researchers next investigated the extent 
of gencDNA diversity using DNA sequenc- 
ing. In total, they identified 6,299 different 
APP gencDNA variants in 96,424 neurons 
from the brains of 5 people with sporadic 
Alzheimer’s — approximately 10 times more 
than they found in the brains of people without 
the disease. In agreement, DISH also revealed 
substantially more gencDNAs in Alzheimer’s 
neurons. 

The authors demonstrated that APP 
gencDNAs were present in the neurons of 
a mouse model of Alzheimer’s disease, but 
rarely in non-neuronal cells or neurons from 
control animals. Moreover, gencDNA vari- 
ants accumulated with age. These findings 
are consistent with a role for APP gencDNA 
variants in the development of Alzheimer’s. 
Indeed, the authors found that some APP 
mRNA variants are translated into proteins 
that are toxic to cells, further strengthening 
this possibility. 

Finally, Lee and co-workers showed that 
gencDNAs could be generated in cells in cul- 
ture, provided that two conditions were met. 
First, the cells DNA had to contain breaks in 
its strands, and, second, the enzyme reverse 
transcriptase had to be active. This enzyme is 
responsible for a process called reverse tran- 
scription, in which matching DNA sequences 
are produced from mRNA. The data indicate 
that gencDNAs arise from reverse-transcribed 
mRNA intermediates, which are incorporated 
into the genome ina process that might be pro- 
moted by breaks in DNA (Fig. 1). In support 
of this idea, the authors detected reverse tran- 
scriptase activity in the human brain samples, 
and a previous study has shown the presence 
of DNA breaks in developing brains*®, whereas 
this phenomenon is rarely observed in other 
tissue types. 

The incorporation of gencDNAs into the 
genome might share some mechanisms with 
retrotransposition — a process in which RNA 
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Figure 1 | Mosaic incorporation of APP variants 
into the neuronal genome. a, The gene amyloid-B 
precursor protein (APP) contains protein-coding 
exons (coloured blocks) and non-coding introns 
(this simplified schematic of the gene does not 
reflect the actual exon-intron composition). 
During transcription, introns are removed 
through a process called splicing to produce 
messenger RNA, which is translated to form the 
wild-type (WT) protein. b, Lee and colleagues® 
found that, in neurons in the human brain, 

APP mRNA undergoes a process called reverse 
transcription to produce a complementary DNA 
(cDNA). The cDNA can be reintegrated into the 
neuronal genome as a genomic cDNA (gencDNA). 
At some point in the process, mutations arise — 
perhaps when the cDNA is integrated into the 
genome, or at an earlier stage (not shown). This 
results in a range of gencDNA APP variants, 

some lacking one or more exons. Some gencDNA 
variants give rise to toxic proteins, leading to 

cell death. These processes might contribute to 
sporadic Alzheimer’s disease. 


transcribed from DNA sequences called 
transposable elements can reintegrate into 
new genomic regions to generate mosaicism’. 
But how gencDNAs become mutated from 
the original APP sequence remains unknown. 
Perhaps the mutations arise from mis-splicing 
of mRNA, or during genomic integration of 
gencDNAs. 

Taken together, Lee and colleagues’ 
work reveals the surprising existence of a 
phenomenon known as somatic gene recombi- 
nation in the brain. This phenomenon, which 
has previously been reported only in anti- 
body generation in immune cells’, increases 
the diversity of proteins encoded by a given 
gene through DNA-shuffling mechanisms. 
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The study hints at a previously unanticipated 
mechanism in the development of Alzheimer’s, 
and expands our understanding of the genesis 
of brain mosaicism. But whether accumula- 
tion of gencDNAs in neurons is a cause of or 
is caused by Alzheimer’s disease remains to 
be proved. 

The techniques used here could be applied 
to investigate whether gencDNA mechanisms 
are at work in other genes in other tissues; 
this could provide insights into diseases such 
as cancer or other degenerative disorders. 
However, it remains possible that gncDNA 
production is specific to APP or to neurons. 
The authors did not find gencDNA variants 
in another gene involved in Alzheimer’s, 
presenilin, but nor did they rule out the 
possibility that gencDNAs could arise from 
other genes. Neurons have many features that 
might make them particularly vulnerable to 
gencDNAs: they are long-lived, have mostly 
stopped dividing, and have higher levels of 
reverse transcriptase activity and DNA-strand 
breaks than do non-neuronal cells’. 

It is also unclear whether the integration 
of APP gencDNAs into DNA is random or is 
biased towards certain genomic regions. The 
development of more-powerful sequencing 
techniques should help to answer this question. 

Of course, there are many other avenues 
for further research. For instance, whether 
gencDNAs co-opt the retrotransposition 
and integration pathways used by transpos- 
able elements remains to be tested. The fact 
that gencDNAs are found in normal neurons 
suggests that they could have some ben- 
efits — this possibility should be examined. 
Finally, it will be interesting to test whether 
inhibitors of reverse transcriptase can pre- 
vent the accumulation of gencDNAs. Only 
when these avenues have been explored will 
we be able to build a complete picture of the 
remarkable phenomenon observed by Lee and 
colleagues. m 
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In Retrospect 


A spotlight on bacterial 
mutations for 75 years 


In the debate about how bacterial mutations arise, an experiment in 1943 showed 
that they can occur spontaneously and independently of a selection pressure. 
This study also popularized the use of maths- driven analysis of biological data. 


MANOSHI S. DATTA & ROY KISHONY 


o bacteria acquire mutations 
D randomly, or do mutations arise 

adaptively as a direct response to 
environmental pressures? This question 
has wide implications in areas ranging from 
evolution to the treatment of bacterial infec- 
tions. In 1943, writing in Genetics, Luria and 
Delbriick! revealed, by a combination of 
experimental analysis and profound math- 
ematical insight, that bacteria evolve through 
random mutations that arise independently 
of an environmental stress, and that occur 
even before bacteria encounter such selective 
conditions. Their study was a milestone in a 
debate about the nature and causes of bacter- 
ial evolution that is still ongoing. Moreover, 
this work has inspired the fields of microbial 
evolution and quantitative biology. 

Luria and Delbriick worked at a time when 
scientists disagreed on the fundamental nature 
of bacterial evolution’, despite tremendous 
advances in molecular biology and micro- 
biology. For plants and animals, there was 
a general consensus that, consistent with 
Charles Darwin’s theory of evolution, natu- 
ral selection acted on mutations that arose 
randomly, regardless of their benefit to the 
organism. However, the unusual nature of bac- 
terial genetics — such as the absence of sexual 
reproduction — sparked a vigorous debate 
about whether the principles that drive animal 
evolution also apply to bacteria (see go.nature. 
com/2brojqp). The main alternative hypothe- 
sis was Lamarckian evolution, named after the 
French biologist Jean-Baptiste Lamarck. In this 
model, the specific mutations that provide an 
advantage to an organism are acquired directly 
in response to the organism's environment’. 

For present-day microbiologists, this debate 
might seem strangely contrived — after all, if 
other organisms evolve in a manner consist- 
ent with the Darwinian principles of randomly 
occurring organismal variation that selection 
can act on, why should bacteria be an excep- 
tion? Yet, it's worth having sympathy for our 
scientific predecessors. Even though we now 
accept that bacteria evolve through Darwinian 
mechanisms, ‘quasi-Lamarckian processes of 
bacterial evolution are still being discovered 
and debated*®. 

Luria and Delbriick themselves encountered 


some difficulties when they entered the debate 
about how bacterial evolution occurs. To 
establish an approach to study mutations in 
bacteria, they allowed individual Escherichia 
coli cells to grow into large populations in indi- 
vidual test tubes, and added the cells from each 
of these tubes to Petri dishes containing agar 
coated with viruses known to kill the bacteria. 
Luria and Delbriick monitored the number 
of visible bacterial colonies on each of the 
plates. Each of these virus-resistant colonies 
arises from a cell and its descendants that had 
a mutation enabling the cells to survive the 
viral attack. Yet, for a simple experiment, their 
results were initially confusing: the number 
of colonies was highly variable between the 
different plates, a result that the authors ini- 
tially attributed to an experimental error (see 
go.nature.com/2brojqp). But in a moment of 
clarity, Luria realized’ that the high variability 
in the number of bacterial colonies might be 
an important clue, not an error. 

Let’s consider the experimental variance 
in the number of virus-resistant colonies per 
Petri dish expected under the process of either 
adaptive or random mutation. If mutations 
arise by an adaptive process, each bacterial 
cell would have a chance of acquiring a resist- 
ance mutation only on encountering the virus. 
Assuming each cell’s chance of becoming resist- 
ant is small, the prediction would be that the 
number of virus-resistant colonies per Petri 
dish would vary according to a Poisson distri- 
bution (a standard probability distribution for 
random events, in which the standard deviation 
of the data equals the square root of the mean). 

But, if evolution is driven by random muta- 
tions, mutations that confer viral resistance 
would arise during the growth of the bacterial 
population before viral exposure. In this case, 
the experimental variance in the number of 
virus-resistant bacterial colonies between dif- 
ferent Petri dishes would be much higher than 
in the adaptive-mutation scenario, because the 
number of virus-resistant bacteria in a given 
test tube would depend on the random timing 
of when mutations occurred. A single virus- 
resistance mutation that occurred early in 
the growth of the bacterial population would 
result in a large number of virus-resistant 
bacterial descendants of the original mutated 
cell, whereas mutations that arose much later 
during the growth of the bacterial culture, just 
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50 Years Ago 


After the wreck of the Torrey 
Canyon in March 1967, some 

8,000 seabirds were taken to 
cleansing stations in Britain — but 
well under ten per cent ... of these 
birds were rehabilitated and returned 
to the sea. Even this figure gives too 
optimistic a picture of the cleansing 
operation, for a large proportion of 
the so-called rehabilitated birds were 
recovered dead within a few days. 
Although exact figures are hard to 
come by, the Torrey Canyon episode 
revealed the complete inadequacy of 
the current methods of rehabilitating 
oiled birds ... Legislation can never 
totally eliminate accidental pollution 
and it is estimated that even the 
much vaunted “load on top” system 
of washing tankers, although a great 
improvement on previous practice, 
produces pollution at a rate of 
400,000 tons a year. 

From Nature 30 November 1968 


100 Years Ago 


[U]ndoubtedly the war has been 
responsible for an enormous 
amount of destruction of capital; 
but when estimates are given ... of 
the percentage of loss in Belgium, 
France, Italy, Serbia and other 
countries, it is not usually borne in 
mind that capital does not merely 
consist of gold and silver, of bricks 
and mortar ... or even of railways, 
steamships and machinery ... but 
of scientific knowledge ... When, 
therefore, we compile estimates of 
the losses due to the war, let us not 
forget that our greatest asset, the 
vast store of knowledge that Science 
has gathered together for us ... is 
still intact. It is a store that has slowly 
been accumulating ever since the 
beginning of the world — a store 
which enables man more and more 
to triumph over Nature, and one 
that for ever remains practically 
indestructible as the real permanent 
capital of the race, and by far its 
most precious heritage. 

From Nature 28 November 1918 
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before viral encounter, would produce many 
fewer virus-resistant bacteria. 

On the basis of this insight, Luria and Del- 
briick generated a statistical distribution (the 
Luria—Delbriick distribution) to describe 
the prevalence of virus-resistant bacterial 
mutants that would be expected if mutations 
arose randomly before the bacterial popula- 
tion came under selective pressure from the 
virus. Compared with a Poisson distribution 
expected for adaptive mutations, this Luria- 
Delbriick distribution has a long ‘tail’ at the 
end of the distribution pattern. In the context 
of the authors’ experiments, this tail would 
correspond to Petri dishes that have a high 
number of bacterial colonies, corresponding 
to early mutational events that lead to a large 
number of mutant descendants. 

The 1943 paper reported the results of the 
authors experiments, termed fluctuation 
tests, that took this mathematical approach 
to analyse the number of virus-resistant 
colonies in E. coli populations. The authors’ 
findings were consistent with mutations fol- 
lowing a Luria—Delbriick distribution rather 
than a Poisson distribution, demonstrating 
that bacterial mutations arose randomly, and 
independently of an encounter with a virus. 

Luria and Delbriick’s work shaped sub- 
sequent studies of biology and evolution in 
many ways. Luria himself was reported as 
saying that their fluctuation test removed bac- 
teria from “the last stronghold of Lamarckism” 
(see go.nature.com/2fbxujf). The fluctuation 
test is still a standard procedure for accu- 
rately measuring mutation rates in diverse 
systems, from bacteria® and yeast” to cancer 
cells’. Their study also popularized the use of 
E. coli and the viruses that attack it as a sim- 
ple experimental model system for biology”. 
Beyond its direct impact in laboratories, the 
experiment became a textbook example of how 
mathematical thinking combined with simple 
experimentation can lead to profound biologi- 
cal insights’’. For their contributions to bac- 
terial and viral genetics, Luria and Delbriick 
won the Nobel Prize in Physiology or Medicine 
in 1969 (which they shared with the biologist 
Alfred Hershey). 

Their insight into mutational processes 
also has implications in settings such as the 
clinic. In analogy to the original experiment, 
imagine a population of patients who have the 
same type of bacterial infection and who are 
being treated with the same antibiotic (the 
antibiotic replaces the virus as the selection 
pressure here). According to the random- 
mutation model, even if all else is equal 
among the patients, the number of antibiotic- 
resistant bacterial mutants initially present 
will vary highly between the patients, which 
could lead to markedly variable treatment out- 
comes. Because such high inherent variabil- 
ity in treatment efficiency reflects resistance 
mutations arising in a population before treat- 
ment, using DNA sequencing or other types of 
analysis to identify the presence and number 


of antibiotic-resistant bacterial mutants before 
treatment could improve our ability to predict 
treatment outcome. 

Did the Luria and Delbriick study really 
close the door on Lamarckism? As far as bac- 
teria are concerned, the answer is much more 
complicated than the duo could probably ever 
have anticipated. 

It is undeniable today that randomly 
occurring mutations and natural selection 
are central tenets of how bacterial evolution 
occurs’. However, scientists are uncovering 
and debating an increasing array of other 
evolutionary processes at work in bacteria, 
some of which are suspiciously Lamarckian 
in character*®. For example, we now know 
that the genome-wide mutation rate, and 
even the mutation rates of specific genes, 
can be shaped by evolution and affected by 
the environment’*. An even more striking 
example is bacterial adaptation through the 
CRISPR-Cas viral-defence system, in which 
bacteria can incorporate viral genetic material 
into their own genomes and use it, as an adap- 
tive mechanism, to protect themselves and 
their descendants against current and subse- 
quent viral attacks’*"’”. These quasi-Lamarkian 
mechanisms presumably evolved by random 
mutations and natural selection. They do 
not necessarily undermine the lessons learnt 
from Luria and Delbriick’s work, but rather, 
show the power of evolution to sculpt living 
organisms in endlessly interesting ways. 

It is intriguing to imagine an alternative 
scientific history that might have occurred 
if Luria and Delbriick had stumbled upon 
one of these quasi-Lamarkian mechanisms. 
The CRISPR-Cas defence mechanism is 
mainly repressed in the E. coli that they 
studied, but it is active in other bacterial 
species, such as Streptococcus thermophilus. 


PALAEOANTHROPOLOGY 


A fun challenge would be to repeat the 
Luria—Delbriick experiment under conditions 
that might favour the evolution of resistance 
by such adaptive mechanisms, for example by 
replacing E. coli with S. thermophilus. Would 
the distribution of the number of resistant 
mutants indicate random or adaptive muta- 
tions? What would Luria and Delbriick have 
concluded had they used a species that had 
the CRISPR-Cas system? The contingency of 
this historic choice underscores the fact that, 
like evolution, science perhaps also progresses 
both adaptively and randomly. = 
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The not-so-dangerous 
lives of Neanderthals 


Have Neanderthals gained an unfair reputation for having led highly violent lives? 
A comparison of skulls of Neanderthals and prehistoric humans in Eurasia reveals 
no evidence of higher levels of trauma in these hominins. SEE LETTER P.686 


MARTA MIRAZON LAHR 


scratch on the skin to a broken bone toa 

fatal trauma. Although many injuries are 
accidental, others can arise as a consequence 
of an individual's or a group’s behaviour, activ- 
ity or social norms — characteristics that tell 
us about societies and the inherent tensions 
and risks within and between different groups. 


IE uries are part of everyday life, from a 
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On page 686, Beier et al.' provide evidence 
that challenges the long-standing view’ that 
Neanderthal populations experienced a level 
of traumatic injuries that was significantly 
higher than that of humans. The result calls 
into question claims”* that the behaviour and 
technologies of Neanderthals exposed them 
to particularly high levels of risk and danger. 
Reports of injuries and deaths are constantly 
in the news. As well as being drawn to read the 


CLAIRE ARTEMYZ 


stories of individuals, such information is of 
interest because of what it tells us about our 
societies. However, to fully understand what 
might determine the current degree of violence 
and injuries, we also need to look back at the 
past and identify the causal underpinnings. 
But how far back should we look? Argu- 
ably, right back to the evolutionary origins of 
processes that shape behavioural, social and 
cognitive tendencies and abilities. 

Anthropologists study skeletal remains to 
reconstruct aspects of ancient lives, building 
an ‘osteobiography’ that casts light on part 
of the life history of an individual. Skeletons 
preserve — in the form of holes, misshapen 
surfaces, bone misalignments and secondary 
fractures radiating out from a point of impact 
—a signature of the traumas that resulted in 
fractured, cut or perforated bones, even if the 
injuries subsequently healed*”. 

Traumatic lesions have been frequently 
identified in Neanderthal fossils, particu- 
larly in the head (Fig. 1) and neck, leading to 
the view’ that higher levels of skeletal injury 
occurred in Neanderthal populations than in 
human populations. However, this is not so, 
say Beier and colleagues. The authors assessed 
published descriptions of Neanderthal and 
modern human fossil skulls found in Eurasia 
from approximately 80,000 to 20,000 years ago. 
Comparing the number of injured and non- 
injured Neanderthal and human skulls, the 
authors report similar levels of head trauma 
in both groups. 

The power of Beier and colleagues analyses 
lies in their study design. Instead of comparing 
Neanderthal data with those of more-recent 
or living human populations, as previous 
studies have done””, the authors based their 
comparisons on humans who not only shared 
aspects of their environment with Neander- 
thals, but whose fossil record also has a similar 
level of preservation. Beier et al. analysed data 
for 114 Neanderthal skulls and 90 human 
skulls. They gathered the data for 14 skull 
bones, and obtained information that ranged 
from 1 bone in poorly preserved fossils to 
data for all 14 bones per individual for well- 
preserved ones. In total, the authors recorded 
trauma incidence in 295 Neanderthal bones 
and 541 human bones. They also collected 
other information, such as the percentage of 
each of the 14 bones that was preserved for each 
individual, as well as details including sex, age 
at death and the fossil’s geographic location. 

Beier et al. ran two sets of statistical analy- 
ses — one based on the presence or absence 
of trauma in each of the skull bones, the other 
on individual fossil skulls as a whole — to 
test whether there were any statistically sig- 
nificant differences between the prevalence 
of trauma in the Neanderthal and human 
fossils. The authors also assessed whether 
trauma prevalence was linked to sex or age, 
taking into account fossil preservation, geo- 
graphic location and possible interaction 
effects between the different variables. 
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Figure 1 | A Neanderthal skull. The Neanderthal fossil called Saint-Césaire 1. This fossil’ shows signs of 
a healed bone injury’ in the region indicated by the arrowheads. Beier et al.' assessed published analyses 
of ancient Neanderthal and human skulls, including that of Saint-Césaire 1. Contrary to the prevailing 
view’ that Neanderthal existence was more violent than that of humans, the authors report that similar 
levels of trauma are present in Neanderthal and human fossils. 


The two analyses gave similar results. 

The more complete the fossils are, the more 
likely they are to have preserved evidence of 
injuries. This might seem obvious, but is an 
issue often ignored in such studies. Beier et al. 
offer a way to deal with this type of bias in 
the available material. Once the authors take 
into account the extent of fossil preservation, 
the predicted prevalence of trauma in 
Neanderthals and 
humans is almost the 


“Risk and ae 
pe alia Both Neanderthal 
of the life of and human males 
had a much greater 
Neanderthals incidence of trauma 
as they were than did the females 
of our own of their respective 
evolutionary species. This pattern 
past. remains the same 


for humans today’. 
One final intriguing result is that, although 
traumatic injuries were present across all of 
the age ranges studied, Neanderthals that had 
trauma to the head were more likely to have 
died under the age of 30 than the humans were. 
The authors interpret this result as evidence 
that, compared with humans, Neanderthals 
either had more injuries when they were young 
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or were more likely to have died after being 
injured. 

Beier and colleagues’ study does not 
invalidate previous estimates of trauma 
among Neanderthals. Instead, it provides a 
new framework for interpreting these data by 
showing that the level of Neanderthal trauma 
was not uniquely high relative to that of early 
humans in Eurasia. This implies that Neander- 
thal trauma does not require its own special 
explanations, and that risk and danger were 
as much a part of the life of Neanderthals as 
they were of our own evolutionary past. The 
result adds to growing evidence that Neander- 
thals had much in common with early human 
groups. However, the finding that Neander- 
thals might have experienced trauma at a 
younger age than humans, or that they hada 
greater risk of death after injury, is fascinating, 
and might be a key insight into why our spe- 
cies had such a demographic advantage over 
Neanderthals. 

Is this the final word on the subject of 
Neanderthal trauma? The answer is no. Beier 
and colleagues assessed only skull trauma. 
What if Neanderthals accumulated more inju- 
ries to their bodies than did humans? There are 
data suggesting that this might be the case’. 
Furthermore, although the authors’ analyses 
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demonstrate the power of a well-designed 
study based on large samples, the data they 
used were recorded by many researchers and 
at varying levels of detail, raising the possibility 
of methodological biases. 

Lastly, the causes of the injuries could 
provide some elusive insights into behaviour, 
activities or social norms in the past. From the 
shape, location and extent of traumatic inju- 
ries in skeletons, and characteristics such as 
the sharpness of fracture edges or the degree 
to which injuries had healed, it is sometimes 
possible to establish the most likely cause of 
a trauma — for example, whether the injury 
probably arose as a consequence of a hunting 
accident’, interpersonal violence” or inter- 
group conflict’’. Moreover, surviving severe 
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trauma might indicate that the injured person 
was cared for by members of their society”. 
Establishing the likelihood of each of these 
scenarios among Neanderthals and early 
modern humans will no doubt continue to 
challenge scientists for many years to come. = 
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A glimpse into 
the heart of a quasar 


Astronomical objects called quasars have been difficult to study because of the 
limited spatial resolution of observations. An approach has been developed that 
allows the structure and dynamics of quasars to be investigated. SEE LETTER P.657 


ERIN KARA 


hen the astronomical object 3C 273 

was detected’, to most optical tele- 

scopes it looked just like a star in 
our Galaxy. But in 1963, astronomers dis- 
covered” that the object was shining from a 
distance of 750 megaparsecs (2.4 billion light 
years). Whatever this mystery object was, it 
was producing more radiation than a trillion 
stars, from a region no bigger than the Solar 
System. Objects such as 3C 273 are now known 
as quasars and are understood to be powered 
by hot gas and dust feeding into a supermassive 
black hole through a structure called an accre- 
tion disk. Fifty-five years after that remark- 
able discovery, 3C 273 is back in the limelight. 
On page 657, the GRAVITY Collaboration? 
reports observations of the spatially resolved 
rotation of hot gas in the quasar at distances 
much closer to the black hole than were 
previously possible. 

A quasar can produce more energy than 
the entire galaxy in which it resides. Although 
the basic mechanism that powers a quasar is 
known, the anatomy of the supermassive black 
hole and its surroundings is not well under- 
stood. Where does the gas that feeds the black 
hole come from? And what effect does the 
resulting intense radiation have on the envi- 
ronment around the black hole? The findings 
of the GRAVITY Collaboration provide a way 
to answer these fundamental questions. 

Determining the structure of a quasar is 


difficult because the black hole is extremely 
small and far away from Earth, and therefore 
the gas orbiting close to the black hole 
cannot be directly imaged using telescopes. 
Instead, astronomers rely on the properties 
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of electromagnetic radiation coming from 
a single point to infer the structure and 
dynamics of the gas and dust around the 
black hole. Such properties include colour, 
time variability, polarization and phase — the 
offset of an electromagnetic wave from a given 
position. 

For the past 30 years, our best understanding 
of gas in the vicinity of a quasar’s black hole 
has come from a method called reverberation 
mapping, which uses echoes of light (analo- 
gous to those of sound) to map out regions 
near the black hole’. The accretion disk 
emits light in all directions, some of which 
is observed directly by telescopes, and some 
of which illuminates a region of surrounding 
gas, known to astronomers as the broad-line 
region. Optical-reverberation mapping meas- 
ures how long it takes the broad-line region 


Accretion disk 


Figure 1 | Structure of the quasar 3C 273. Quasars are astronomical objects comprising a supermassive 
black hole surrounded by hot gas and dust. As this material is pulled towards the black hole through a 
structure known as an accretion disk, energy is released in the form of light and, in the case of the quasar 
3C 273, as a beam of charged particles called a jet. The GRAVITY Collaboration‘ reports a technique that 
enables the rotation of gas in a part of 3C 273 known as the broad-line region to be spatially resolved. The 
researchers determine that this gas moves perpendicular to the jet and has the shape ofa thick ring. 
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to respond to illumination from the accretion 
disk, which, in effect, measures the distance 
between the disk and the surrounding gas°. 
In a similar way to how bats use echolocation 
to map outa dark cave, astronomers measure 
light echoes to map out the hot gas around 
black holes. 

The GRAVITY Collaboration has ushered 
in an alternative technique that spatially 
resolves the motion of such gas using the 
GRAVITY instrument in Chile’. This instru- 
ment is an interferometer that combines the 
light from four near-infrared telescopes that 
are 8 metres in diameter to produce a virtual 
‘super telescope’ that is 130 m in diameter. 
Because the spatial resolution of a telescope 
depends on its size, the use of the GRAVITY 
instrument is a giant step in imaging capabil- 
ity. The collaboration measured the offset in 
phase between the direct emission of light 
from 3C 273 and the light from the broad-line 
region to spatially resolve the motion of this 
gas in a distant quasar for the first time. 

The team observed a velocity gradient 
in the gas on size scales of 10 microarcsec- 
onds — an achievement that is comparable to 
seeing a coin on the Moon from Earth. The 
researchers found that the motion of this gas 
is perpendicular to the known large-scale jet 
(a beam of charged particles) projected from 
3C 273 (Fig. 1). The results suggest that the 
gas is in the form of a thick ring with a radius 
of 0.12 parsecs, rotating around a black hole 
that has a mass 300 million times that of the 
Sun. These findings support previous esti- 
mates from reverberation mapping of 3C 273 
that indicated a similar black-hole mass and 
gravitationally bound gas at a distance of 
0.08-0.34 parsecs from the black hole*”. 

For astronomers, the excitement about the 
current work is not because the results have 
fundamentally changed our understanding 
of quasars, but rather because this impressive 
technological advance enables an independent 
cross-check of optical-reverberation mapping 
— the most widely used method for determin- 
ing the structure of gas around supermassive 
black holes. Optical reverberation has been 
measured in roughly 60 quasars", and the 
inferred properties of the gas strongly correlate 
with the luminosity of the quasar and the mass 
of the central black hole. 

These correlations have been applied to large 
samples that comprise thousands of quasars. 
They have thereby informed our understand- 
ing of far-reaching aspects of astronomy, from 
the co-evolution of black holes and galax- 
ies over cosmic time to the rate at which the 
expansion of the Universe is accelerating. 
Having an independent cross-check from spa- 
tially resolved interferometric observations, as 
reported by the GRAVITY Collaboration, is 
valuable for confirming several key findings 
in astrophysics that rely on the robustness of 
reverberation-mapping results. 

It is important to keep in mind that the 
results presented in the paper are based 


on one particular quasar. The GRAVITY 
Collaboration observed 3C 273 because it was 
the best target for optical interferometry. How- 
ever, the quasar is by no means the best target 
for reverberation mapping, which makes it 
difficult to compare the results from these two 
methods critically. 

Going forward, the GRAVITY instrument 
should be capable of spatially resolving the 
dynamics and orientations of the broad- 
line region in about ten other quasars''. To 
best corroborate or dispute sizes and struc- 
tures inferred from reverberation mapping, 
coordinated campaigns on the same quasars 
using two independent techniques must be 
carried out. The GRAVITY instrument is at 
the beginning of its scientific operations, and 
these early technical achievements bode well 
for future investigations that peer deeper into 
the hearts of quasars. = 
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A bacterium’s enemy 
isn’t your friend 


The bacterium Staphylococcus aureus is a leading cause of hard-to-treat human 
infections. It now seems that, if the bacterium is infected by a virus, a viral enzyme 
helps the microbe to evade detection by the immune system. SEE LETTER P.705 


MICHAEL S. GILMORE & ONA K. MILLER 


icroorganisms thrive on our body’s 
M surfaces. The species present are not 
just a random assembly; rather, they 
are a community of organisms that are par- 
ticularly well adapted to the local conditions 
of temperature, moisture, nutrient availability 
and host defences’. Staphylococcus aureus is 
one of our most common bacterial residents. 
It usually lives in nasal, respiratory and repro- 
ductive tissues without causing disease, yet, 
unlike many other resident bacteria, S. aureus 
has the capacity to give rise to a potentially 
deadly infection’. 
During the past 50 years’, the resistance of 
S. aureus to antibiotics has become an increas- 
ing problem, and strains of the bacterium 
termed methicillin-resistant S. aureus 
(MRSA), which are resistant to treatment with 
the antibiotic methicillin and other methylated 
penicillin-based antibiotics, cause both hos- 
pital- and community-acquired infections 
around the globe. On page 705, Gerlach et al.’ 
describe a previously unknown mechanism 
whereby viruses influence whether MRSA is 
recognized by the immune system, shedding 
light on a process that might tip the balance 
in determining whether this bacterium will be 
harmless or disease-causing. 
Staphylococcus aureus belongs to the 


Gram-positive group of bacteria, and has 
been described as existing on the borderline 
between being a normal human microbial 
resident and a disease-causing organism’. 
This bacterium seems to have the capacity 
to probe for signs of host weakness, such as 
reduced immune defences caused by disease. 
When this is detected, the bacterium can 
increase its population to a level that can cause 
the death of the host*. Factors that regulate 
host-microbial interactions are complex, and 
in addition to host defences, such interactions 
can be influenced by the presence or absence of 
other bacteria’. Gerlach and colleagues report 
that viruses can also be part of the mix that 
influences host-microbial interactions in the 
context of MRSA. 

In Gram-positive bacteria, the cell wall 
contains polymers known as wall teichoic 
acids (WTA), which are made up of ribitol 
phosphate or glycerol phosphate molecules 
and can constitute up to half of the cell- 
wall mass®. Unlike the other main cell-wall 
component, peptidoglycan, which forms a 
porous and comparatively insoluble mesh- 
work, WTA form a highly hydrated, gel-like 
material that fills much of the space between 
peptidoglycan strands. WTA provide a solu- 
ble matrix through which all substances 
pass before reaching the bacterial cell mem- 
brane, and therefore affect bacterial access 
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a Strong immune response against WTA 
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Figure 1 | Viral infection of a bacterium can alter the host’s immune response to the microbe. a, The 
bacterium Staphylococcus aureus is a resident of the human body. Its outer surface is coated with layers 

of the polymer peptidoglycan crosslinked to wall teichoic acids (WTA) — polymers of ribitol phosphate 
molecules. The bacterial enzyme TarS modifies WTA, generating a form that has the molecule GlcNAc 
attached at carbon atoms in the C4 position in the ribitol. Human antibodies against S. aureus often target 
WTA. b, Gerlach et al.’ report that some antibiotic-resistant strains of S. aureus, which are associated with 
difficult-to-treat infections, have been infected with a virus called a phage. The phage DNA encodes an 
enzyme called TarP that attaches GlcNAc to WTA at carbon atoms in the C3 position of ribitol rather than 
in the normal C4 position. In studies of mice and of human cells, the authors find that these TarP-modified 
WTA trigger an immune response that is weaker than the response against TarS-modified WTA. 


to ions, nutrients, proteins and antibiotics’. 
In S. aureus, WTA are composed of units of 
p-ribitol phosphate, which are crosslinked to 
the peptidoglycan (Fig. 1). WTA function is 
tuned by attachments of the amino acid p-ala- 
nine and of N-acetylglucosamine (GlcNAc)’ 
molecules to the ribitol-phosphate polymer. 

Gerlach and colleagues decided to 
investigate whether bacterial evasion of 
immune-system defences might be one of 
the reasons that MRSA strains can reach high 
enough bacterial numbers to cause disease. 
The authors studied the genome sequences 
of MRSA strains to identify genes encoding 
enzymes that modify WTA. This revealed that 
some MRSA strains encode an enzyme called 
TarP that catalyses the addition of GlcNAc 
to D-ribitol phosphate at a particular carbon 
atom (known as C3) in the ribitol. Normally, 
GlcNAc is added at a different position, the 
C4 carbon, by the action ofa related enzyme 
called TarS. 

Surprisingly, the TarP-encoding sequence 
is of viral origin, and is found in S. aureus as 
a result of infection by a bacterial virus called 
a phage. TarP is dominant over its bacterial 
counterpart, TarS — that is, if both enzymes 
are present, the GlcNAc linkage is made on 
the C3 carbon of ribitol, rather than on the 
C4 carbon. S. aureus is normally held in check 
because the immune system has the ability to 
detect it. However, the authors found that, in 
mice, the form of WTA made by TarP action is 
less likely to trigger an immune response than 


is the form of WTA generated by TarS. 

This virus-mediated change to the S. aureus 
cell wall reported by Gerlach and colleagues 
is important for two reasons. First, it high- 
lights the fact that a fragile truce between 
host and resident microbe can be affected by 
the intervention of a third party with its own 
vested interests. Second, at a time that some® 
have called the beginning of a ‘post-antibiotic 
era — given the rise in antibiotic-resistant 
bacteria and the limited development of new 
antibiotics reaching the clinic — there is a press- 

ing need to develop 


“A fragile truce new strategies to 
between host manage infection. 

and resident We are now at the 
microbe can be dawn of a clinical 
affected by the era in which the goal 
intervention of a will be to precisely 


manage human and 
microbial interac- 
tions to promote 
health and limit disease. Antibiotics will con- 
tinue to have a key role, as undoubtedly will 
other approaches, including the replacement 
of a person's gut microbes using techniques 
such as faecal transplants, or the use of phage- 
mediated elimination of undesirable microbes. 
Determining the best approach will be helped 
by the development of new diagnostic tools 
and a clearer understanding of the nature of 
human and microbial interactions. If decid- 
ing whether to take an approach based ona 
vaccine or possibly using phage treatments 


third party.” 
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in the future, key considerations will include 
knowing how a bacterium’s susceptibility 
to phage infection varies, and determining 
whether the presence of phage DNA in a bac- 
terial genome affects the dynamics between 
human cells and the microbes that colonize 
the body. 

We do not yet know whether the phage- 
mediated alteration of WTA described 
by Gerlach and colleagues affects where 
the bacteria reside on the body or the 
number of bacterial cells present. We also 
lack a clear understanding of whether the 
antistaphylococcal WTA-targeting antibodies 
that most people have, and which do not seem 
to be protective against infection in immune- 
deficient individuals, are a ‘distraction imposed 
by the presence of S. aureus. This distraction 
would keep the immune system busy generating 
antibodies that end up in ineffective locations 
such as the bloodstream and do not elimi- 
nate the microbe. Alternatively, this low-level 
immune warfare could represent a stalemate 
between the host and its resident bacteria. 

It is clear that phage-encoded TarP changes 
the immune reactivity of S. aureus. Ina 
model system of human immune cells grown 
in vitro, the authors found that S. aureus strains 
encoding TarP were cleared from the system 
less effectively than were S. aureus strains 
that lacked TarP. Similar phage-mediated 
changes in a bacterial cell surface that alter 
antibody recognition of the microbe have 
been reported’ for the disease-causing Gram- 
negative bacterium Shigella flexneri. 

Gerlach and colleagues’ work, as well as 
that of others in this area, demonstrates that 
the balance between host and microbes is a 
dynamic one. The discovery that phages can 
have a role in tipping the delicate balance 
between S. aureus colonization and infection 
might one day affect the choice of approaches 
for treating MRSA infections. = 
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Somatic APP gene recombination in 
Alzheimer’s disease and normal neurons 


Ming-Hsiang Lee!, Benjamin Siddoway!*, Gwendolyn E. Kaeser”, Igor Segota!?, Richard Rivera!, William J. Romanow!, 


1 


Christine S. Liu!’, Chris Park!?, Grace Kennedy!, Tao Long! & Jerold Chun!* 


The diversity and complexity of the human brain are widely assumed to be encoded within a constant genome. Somatic 
gene recombination, which changes germline DNA sequences to increase molecular diversity, could theoretically alter 
this code but has not been documented in the brain, to our knowledge. Here we describe recombination of the Alzheimer’s 
disease-related gene APP, which encodes amyloid precursor protein, in human neurons, occurring mosaically as 
thousands of variant ‘genomic cDNAs’ (gencDNAs). gencDNAs lacked introns and ranged from full-length cDNA copies 
of expressed, brain-specific RNA splice variants to myriad smaller forms that contained intra-exonic junctions, insertions, 
deletions, and/or single nucleotide variations. DNA in situ hybridization identified gencDNAs within single neurons that 
were distinct from wild-type loci and absent from non-neuronal cells. Mechanistic studies supported neuronal ‘retro- 
insertion’ of RNA to produce gencDNAs; this process involved transcription, DNA breaks, reverse transcriptase activity, 
and age. Neurons from individuals with sporadic Alzheimer’s disease showed increased gencDNA diversity, including 
eleven mutations known to be associated with familial Alzheimer’s disease that were absent from healthy neurons. 
Neuronal gene recombination may allow ‘recording’ of neural activity for selective ‘playback’ of preferred gene variants 
whose expression bypasses splicing; this has implications for cellular diversity, learning and memory, plasticity, and 


diseases of the human brain. 


The diversity of neuronal form and function is intrinsic to the human 
brain, but its basis remains largely unknown. Early speculations 
involved gene recombination’, analogous to the mechanism of anti- 
body diversification that was later identified’, but this has not been 
described in the brain**. Nevertheless, later identification of genomic 
mosaicism’, which arises somatically to produce brain cells with 
distinct if seemingly random genomic changes, suggested genome 
dynamism that might include gene recombination. Genomic mosa- 
icism was first identified in neural progenitor cells and neurons as 
aneuploidies and DNA content variation, both representing large 
copy number variations (CNVs)**®. Randomly distributed, smaller 
megabase-scale CNVs, LINE] repeat elements, and single nucleotide 
variations (SNVs) were subsequently identified. Genomic mosaicism 
can influence cell survival and gene transcription, but somatic gene 
recombination of specific genes has not been reported*’. 

A candidate gene for neuronal recombination is APP, which shows 
mosaic CNVs in normal human brains. These CNVs are increased 
in sporadic Alzheimer’s disease (SAD)!°, the most common form of 
Alzheimer’s disease. APP is central to the amyloid hypothesis wherein 
APP is cleaved by secretases to form toxic amyloid-6 (A) peptides and 
plaques, causing Alzheimer’s disease!!. Constitutive APP mutations and 
duplications are believed to cause rare forms of familial Alzheimer’s 
disease (FAD) and Alzheimer’s disease neuropathology in Down syn- 
drome (trisomy 21 with 3 APP copies), supporting the idea that they 
have a pathogenic role when present mosaically in SAD!?-'4. We previ- 
ously identified mosaic, neuronal APP CNVs that showed heterogene- 
ous signals that might be explained by gene recombination’”. However, 
interrogation of APP genomic loci (about 0.3 Mb) using low-depth, 
short-read single-cell sequencing capable of detecting CNVs produced 
negative results that were complicated by resolution limitations®'. We 
therefore developed an alternative strategy focused on APP in small 


cell populations, using nine distinct methodologies (Extended Data 
Table 1). 


Novel APP RNA variants in neurons 

We postulated that genomic sequence alterations in APP, existing 
mosaically, could be detected in RNA through transcriptional ampli- 
fication. Assessments were focused on small populations of nuclei 
rather than bulk samples that are dominated by annotated species 
(Extended Data Fig. 1a) to detect mosaic alterations. The workflow 
(Fig. 1a) commenced with fluorescence-activated nuclear sorting 
(FANS)'* to isolate neuronal nuclei from prefrontal cerebral cor- 
tices from both control individuals and those with verified SAD, 
which were run in parallel (Extended Data Table 2). Groups of 50 
NeuN-positive neuronal nuclei were isolated and processed for 
PCR with reverse transcription (RT-PCR; Fig. 1a) and downstream 
analysis. RT-PCR using validated primers on exon 1 and exon 18 
(Supplementary Table 1), which can amplify full-length APP cDNA 
(APP-770, NM_000484.3), detected the expected splice variants 
APP-751 (NM_201413.2) and APP-695 (NM_201414.2)!” (Extended 
Data Fig. 1b). However, multiple unexpected bands of varied sizes 
were also identified (Fig. 1b). The RT-PCR products were Southern 
blotted with *?P-labelled APP cDNA probes (Fig. 1c), and positive 
bands were cloned and Sanger sequenced. The new bands yielded 
APP cDNA sequence variants unlike any previously reported, char- 
acterized by loss of central exons with proximal and distal exons 
linked by intra-exonic junctions (IEJs) (Fig. 1d,e). Twelve novel 
RNA variant sequences with unique IEJs were identified in neurons 
(Fig. le) and non-neurons displayed no variants (Extended Data 
Fig. 1c). IEJs were independently observed in five oligo-dT-primed 
cDNA libraries; three from sorted neuronal nuclei from individu- 
als with SAD (Extended Data Fig. 1d) and two from commercially 
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Fig. 1 | Identification of novel APP RNA variants from small 
populations of neurons. a, Fifty neuronal nuclei were sorted from human 
prefrontal cortices (FCTX) (1) and used for RT-PCR (2). The resulting 
RT-PCR products were screened by Southern blotting with *”P-labelled 
APP cDNA probes (3). Bands with positive signals from duplicate gels 
were cloned and sequenced (4), and variants were identified (5). 

b, Electrophoresis of RT-PCR products from the brains of three 
non-diseased (ND) individuals and three patients with SAD, with two 
populations each (a and b). APP and PSEN1 plasmids were run as positive 


produced long-read RNA-seq data sets from whole brain and temporal 
lobe of patients with SAD (Extended Data Fig. le). Five variants 
retained coding potential and seven contained premature stop codons 
(Extended Data Table 3, Supplementary Table 2). One prevalent form 
was characterized by an IEJ between the 24th nucleotide of exon 3 and 
the 45th nucleotide of exon 16 (Fig. le, R3/16). Detection of R3/16 
by RNA in situ hybridization (RISH) on SAD brain sections indi- 
cated the cytoplasmic presence of variants (Extended Data Fig. 1f). 
Notably, sequence complementarity of joined exons was found in all 
12 IEJs, ranging in overlap from 2 to 20 nucleotides (Fig. le, Extended 
Data Table 3, Supplementary Table 2). Amplification for a second 
gene related to Alzheimer’s disease, PSEN1, did not identify variants 
(Extended Data Fig. 1g). 


gencDNA sequences in neuronal genomes 

The existence of previously unidentified RNA variants raised the ques- 
tion of whether this transcriptional heterogeneity originated from 
mosaic variation in DNA. We carried out high-stringency amplifi- 
cation, using the APP primers previously used for RNA and cDNA 
analyses, on RNase-treated DNA extracted from sets of 20 neuronal 
nuclei from both healthy brains and those with SAD (Fig. 2a). PCR of 
the wild-type APP genomic locus was not possible because of its length 
(about 300 kb) (Fig. 1d). However, PCR on genomic DNA generated 
similar-sized bands to novel RNA variants (Fig. 2b, Extended Data 
Fig. 2a). Sanger sequencing revealed multiple gencDNAs and seven of 
eight were identical to those identified in RNA (Fig. 2c). We validated 
the presence of APP gencDNAs in neurons using multiple, distinct 
primer sets (Extended Data Fig. 2b, c). We did not detect gencDNAs 
in DNA isolated from human lung fibroblasts (IMR-90), human 
embryonic kidney cells (HEK-293), or non-neuronal nuclei from the 
brains of individuals with or without SAD (Extended Data Fig. 2d, e). 
Amplification of PSEN1 did not produce products from genomic DNA 
(Fig. 2b, Extended Data Fig. 2a). 
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and negative controls for Southern blotting. c, Southern blot of RT-PCR 
products. Arrowheads indicate examples of bands from b that were cloned 
and Sanger sequenced. d, Structure of human APP genomic locus and 
spliced APP-770 full-length cDNA drawn to scale; the colour scheme 
remains consistent throughout all figures. e, APP RNA variants identified 
by RT-PCR. The sequences of homology regions forming IEJs are shown. 
Variant sequences deviating from Refseq are shown in red with asterisks. 
R, RNA identified; #/#, exon-exon junction; .#, for multiple unique 
junctions. 


gencDNA detection by non-PCR methods 
To validate the presence of APP gencDNA junctions within single neu- 
ronal genomes without polymerase-based amplification, we developed 
DNA in situ hybridization (DISH). Our method extensively modified 
the sample preparation and hybridization protocols (see Methods) of 
a commercial RISH product, BaseScope (ACD), to recognize genomic 
sequences. BaseScope technology uses paired ISH probes to elimi- 
nate hybridization artefacts and can detect specific junctions. Two 
DISH probes were extensively used (Extended Data Table 4): one that 
recognized a common gencDNA sequence via the exon 16-exon 17 
junction (DISH 6/17), which spans the A8 coding region of APP; and 
one that recognized the newly identified IEJ formed between exons 
3 and 16 (DISH3,/;¢). Bound probes were visualized as red dots with 
varying diameters. All probes passed multiple specificity requirements 
involving positive and negative controls. Sense and antisense DISH 
probes produced similar results in RNase-treated neuronal nuclei 
from individuals with SAD (Fig. 2d-i). By comparison, RNA signals 
were detected only using the antisense probes (Extended Data Fig. 1f); 
therefore, sense probes were used in all subsequent DISH analyses. 
Critically, DISH signals were eliminated by destruction of the target 
sequence by specific (but not off-target) restriction enzyme diges- 
tion (Fig. 2j-m, Extended Data Fig. 2f). In addition, no DISH signal 
was detected on cells infected with retroviruses containing wild-type 
human genomic APP sequences lacking target sequences (Extended 
Data Fig. 2g, h). Notably, double labelling with dual DISH probes 
recognizing the intron 2—exon 3 wild-type genomic sequence com- 
bined with DISH3/16 or DISH,6/:7 demonstrated that APP gencDNAs 
did not usually co-localize with the wild-type locus (Fig. 2n). Thus, 
DISH detected specific APP gencDNA junctions within genomic DNA 
without polymerase-dependent amplification, revealing multiple loci 
distinct from germline APP alleles. 

A completely independent approach also identified APP gencD- 
NAs without primary PCR amplification by using a custom Agilent 
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Fig. 2 | APP gencDNAs identified by DNA polymerase-dependent and (f, h; points represent average of independent experiments) and frequency 


-independent methods. a, FANS-isolated neuronal nuclei from human distributions (g, i) showed no significant differences (unpaired, two-tailed 
prefrontal cortices (1) were used for genomic DNA PCR (2), DISH (3), Student's t-test). j-m, Restriction enzyme (RE) digestion using MluCI 

and custom target enrichment followed by deep sequencing (4). (j, k) and PstI + MslI (1, m) to eliminate 16/17 (—RE n= 349, +RE 

b, Electrophoresis of genomic DNA PCR products with APP and PSEN1 n= 440) and 3/16 (—RE n= 367, +RE n= 340) target sequences, 

primer sets, from neurons from normal brains or brains from individuals respectively. Statistical significance on all bar graphs was determined using 
with SAD with two replicates (a and b). Non-template control (NC) and unpaired, two-tailed Student's t-test. n, Dual DISH with intron 2/exon 3 
positive control (PC) with indicated plasmids are shown. c, Cloning and (red) genomic locus and 16/17 or 3/16 probes (green). 0, Schematic of 
Sanger sequencing revealed multiple gencDNA sequences. d-i, DISH was APP cDNA and genomic exon-exon junctions identified by Agilent 
performed with sense and antisense probes targeting the exonic 16/17 SureSelect enrichment of the APP locus and Illumina sequencing; reads 


junction (d, f, g; sense n = 339 and antisense n = 335), and the intra-exonic —_ below span two exon-exon junctions. NS, not significant. Error bars show 
3/16 junction (e, h, i; sense n = 490, antisense n = 484) on neuronal nuclei s.e.m. Scale bars, 10 jum. 
from individuals with SAD. f-i, Relative average number of foci 
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SAD. a, Neuronal nuclei from prefrontal cortices of individuals with 
or without SAD were sorted (1) and used for genomic DNA PCR (2). 
Multiple reactions were pooled for library preparation (3) to enable 
SMRT-CCS (more than 20 passes) (4). b, Exon-exon junctions identified. 
c, Key for outermost circle of d and e, representing the sum of changes 
at each genomic location. d, e, Concentric circle plots of the APP locus 
depicting IEJs (central lines), deletions (del), insertions (ins) and SNVs 
sequenced from the brains of five individuals with SAD (d) and five 


SureSelect targeted DNA pull-down (Extended Data Fig. 2i), which 
showed unbiased genomic coverage across the entire APP genomic 
locus (all introns and exons including exon 8; Extended Data Fig. 2)). 
Analysis of DNA from 40,000 neuronal nuclei from individuals with 
SAD identified all previously detected gencDNA exon-exon junctions 
excluding exon 8, which is absent from the brain-specific APP mRNA 
splice variants APP-751 and APP-695 (Fig. 20; see also Fig. le). 


Distinct APP gencDNAs in SAD 

The diversity of gencDNA sequences was assessed by a distinct tech- 
nical approach, single molecule real-time (SMRT) circular consensus 
sequencing (CCS), which enables high-certainty, long-read calls to be 
produced by multiple passes over the same template. gencDNAs were 
enriched by multiple PCR reactions on small neuronal populations 
from five individuals with SAD (149 reactions from 96,424 nuclei) 
and five healthy brains (244 reactions from 162,248 nuclei; Fig. 3a). 
Samples were pooled for library preparation and SMRT-CCS. Of note, 
more non-diseased nuclei than diseased nuclei were required to pro- 
duce sufficient product for sequencing. We identified 6,299 unique 
sequences (10% in frame; Extended Data Fig. 3a, b) including 45 dif- 
ferent IEJs, in neuronal nuclei from the brains of individuals with SAD, 
and 1,084 unique sequences (12.1% in frame; Extended Data Fig. 3a, c), 
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including 20 IEJs, in neuronal nuclei from non-diseased brains 
(Fig. 3b-i). 

Critically, both qualitative and quantitative differences in the 
sequences of gencDNA variants distinguished the brains of individ- 
uals with SAD from healthy brains (Fig. 3b-i). Distinctions included 
gencDNAs with novel IEJs and SNVs (Fig. 3d, e), which were far more 
prevalent in the brains of individuals with SAD. By contrast, gencD- 
NAs of the canonical neuronal splice variants, APP-751 and APP-695, 
predominated in non-diseased brains, and brains from individuals 
with SAD showed reduced APP-751 and no APP-695 (Fig. 3f, h). 
Notably, 11 SNVs that had been previously published as pathogenic 
FAD mutations (Fig. 3d, j, Supplementary Tables 3, 5), including the 
Indiana mutation”, were present in neurons from individuals with 
SAD. No FAD mutations were detected in non-diseased brains (Fig. 3e, 
Supplementary Tables 4, 6). 


gencDNA formation in cell lines 

APP gencDNAs lacking introns, and the presence of brain-specific iso- 
forms (APP-751, APP-695), support origins of gencDNAs from RNAs 
that must involve reverse transcription. To model gencDNA produc- 
tion in cell lines, we expressed APP-751 cDNA (Fig. 4a) in a Chinese 
hamster ovary (CHO) cell line with endogenous reverse transcriptase 
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Fig. 4 | Mechanistic studies of gencDNA formation in culture. 

a, Timeline of CHO cell experiments. RTi, AZT (100 1M) and ABC 

(10 1M); transfection, APP-751; DNA damage, H»O> at 5 1M (+) and 

50 1M (++). b, Gel electrophoresis with red arrowheads indicating cloned 
and sequenced bands. c, New induced gencDNA variants. d, Reverse 
transcriptase (RT) activity was analysed in Super Script II (SSII) positive 
controls, CHO cell lysate, and human brain lysate. e, Four independent 
experiments showed decreased reverse transcriptase activity in CHO cells 
in response to the RTi azidothymidine triphosphate (AZT-TP). Colours 
represent individual experiments. f-h, Relative reverse transcriptase 
activity in SSI controls, CHO cells, and brain samples (three independent 
experiments with three biological replicates). Statistical significance 

was determined using ordinary one-way ANOVA with Sidak’s multiple 
comparisons test. ***P = 0.003, ****P < 0.0001. NS, not significant 

(f, g, P > 0.9999; h, P=0.3095). Error bars show s.e.m. i, Proposed model 
of reverse transcriptase activity in the formation of gencDNAs. 


activity. Initial results did not show gencDNAs, but induction of 
DNA strand breaks by H2O2 produced novel gencDNAs (Fig. 4b, c). 
Additionally, endogenous reverse transcriptase activity was also 
required to produce gencDNAs, based on results using the nucleoside 
reverse transcriptase inhibitors (RTi) abacavir (ABC) and azidothy- 
midine (AZT) (Fig. 4b). Variant RNAs were also dependent on reverse 
transcriptase activity (Extended Data Fig. 4). Endogenous reverse tran- 
scriptase activity was confirmed in CHO cells and further identified in 
human prefrontal cortex (Fig. 4d—h), consistent with gencDNA pro- 
duction from RNA intermediates and reverse transcription (Fig. 4i). 


Increased gencDNAs in SAD and J20 neurons 

We further explored the relationships between gencDNAs and SAD 
using DISH. We examined two gencDNA junctions, DISHi6/17 and 
DISH3,16, in neurons from six individuals with verified SAD and six 
non-diseased brains (Fig. 5a—f, Extended Data Fig. 5a—f, Extended Data 
Table 2; average age 83.5 and 86.7 years, respectively). The number of 
red foci in neurons from individuals with SAD was three- to fivefold 
higher than in non-diseased neurons and ranged from 0 to a maxi- 
mum of 13 in SAD nuclei. Rare foci were observed in non-neuronal 
nuclei but were not statistically increased in SAD (Fig. 5a—f, Extended 
Data Fig. 5a-f). Increased gencDNAs in neurons from individuals with 
SAD raised the question of whether gencDNAs could give rise to toxic 
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proteins. We therefore tested the cytotoxicity of three APP RNA var- 
iants (R2/18, R3/14, and R3/16), which were translated in vitro, and 
found that two of the three variants induced cell death in SH-SY5Y 
cells (Extended Data Fig. 5g, h). 

The J20 mouse model of Alzheimer’s disease forms A8 plaques that 
accumulate with age. These mice harbour multiple copies of a human 
APP transgene containing the Swedish (K670M/N671L) and Indiana 
(V717F) mutations, driven by a neuron-specific platelet-derived growth 
factor-8 (PDGF-) promoter to produce selective, high expression in 
neurons, with little or no expression in non-neuronal cells!®, DISH 
probes for human APP did not detect the endogenous mouse locus 
(Fig. 5g, Extended Data Fig. 6b). DISH3,1¢ identified enriched signals 
in J20 neuronal nuclei, contrasting with low levels in non-neuronal 
nuclei from the same mice (Fig. 5g, h, Extended Data Fig. 6a). The more 
prevalent gencDNA sequence recognized by DISH 6/7 was also highly 
enriched in neurons. Notably, DISH}¢/17 demonstrated an age-depend- 
ent increase in the area of gencDNA foci over a 2.3-year period, a pat- 
tern of change that was not observed in non-neuronal nuclei (Fig. 5i, j, 
Extended Data Fig. 6c). Use of cells infected with retroviral proviruses 
containing 0, 1, or 2 copies of the DISH 6/17 target sequence demon- 
strated that DISH is semiquantitative and reflects DNA copy numbers 
(Extended Data Fig. 6d-f). The neuron-selective increase in area of foci 
occurs during adult life, long after cerebral cortical neurogenesis has 
ceased’, further supporting the theory that neuronal gene transcrip- 
tion generates gencDNAs. 


Discussion 

Human neuronal APP gene recombination was identified in brains 
from healthy controls and individuals with SAD. It was characterized 
by the mosaic presence of thousands of distinct gencDNA variants 
that enter neuronal genomic DNA through a process involving APP 
transcription that is influenced by neural activity, DNA strand breaks 
and reverse transcription (Supplementary Discussion). APP gencDNAs 
bear some resemblance to, but are fundamentally distinct from, pro- 
cessed pseudogenes” (non-coding, germline remnants of evolution- 
arily retrotransposed mRNAs”! that can be active in cancers”*) and 
LINE] repeat elements (which encode an active reverse transcriptase 
(ORF2)*° to allow potential retrotransposition in mitotic cells, includ- 
ing within the developing brain®”*°). By comparison, APP gencD- 
NAs manifest as thousands of distinct genomic variants derived from 
a cellular gene, contain IEJs and myriad SNVs, can undergo multiple 
‘retro-insertions’ into post-mitotic neuronal genomes, and appear 
capable of being actively transcribed and translated to produce variant 
bioactive products that are relevant to both normal and diseased states. 

Constitutive mutations or APP CNVs are considered causal in FAD 
and Down syndrome, raising the possibility that previously reported 
somatic APP exonic CNVs contribute mechanistically to SAD!, which 
can be explained by the somatic gene recombination identified here. 
Proof-of-concept data from individuals with SAD identified a marked 
shift in the forms and abundance of gencDNAs when compared with 
healthy controls (Figs. 3, 5), including the three- to fivefold increase in 
gencDNAs in all brains from individuals with SAD examined (Fig. 5). 
Notably, we identified 11 somatic SNVs that were previously identi- 
fied as being pathogenic in FAD”, which were absent from non-dis- 
eased controls. Other SNVs, as well as myriad genomic alterations, 
may also contribute to SAD through both classical and non-classical 
mechanisms. 

Classical mechanisms that support the amyloid hypothesis involve 
production of toxic AB peptides and plaque formation. gencDNAs, 
including those with FAD-associated mutations, are likely to represent 
a source of secretase-cleaved substrate for the production of AQ, as well 
as a potential source of toxic products that do not require secretase 
cleavage (Extended Data Fig. 5). Non-ATG translation could potentially 
occur for out-of-frame variants””. Non-classical mechanisms might 
involve RNA pathologies”’ or maximum limits to gencDNA integration 
per neuron, beyond which neurodegeneration occurs via genome insta- 
bility, akin to deleterious mobile elements”’. The potential diversity of 
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Fig. 5 | Proof-of-concept correlation between gencDNAs and SAD. 
a-f, Nuclei sorted from the brains of six individuals with SAD and six 
individuals without SAD were analysed by DISHj6,/17 (a-c) and DISH3/16 
(d-f). a, d, Representative DISH images. b, e, Average number of foci per 
nucleus was increased in neurons from individuals with SAD (one-way 
ANOVA with Holm-Sidak’s multiple comparison test). c, f, Frequency 
distributions displaying the percentage of nuclei with 0, 1, 2 and 3 or 
more (3+) foci (two-way ANOVA with Tukey’s multiple comparison 
test). g, Representative DISH3/16 of J20 and wild-type (WT) nuclei. h, The 
percentage of nuclei with one or more foci was increased in J20 neurons 
(ordinary one-way ANOVA with Sidak’s multiple comparisons test, J20+ 


protein variants produced by gencDNAs, and other non-protein mech- 
anisms, may help to explain the failure of therapeutic trials targeting AB 
and related enzymologies, especially those targeting single molecular 
entities*”. 

The largest risk factor for SAD is age, and the age-related increase 
in gencDNA variants in neurons offers a possible explanation for the 
decades of life required for SAD to manifest. Neuronal APP transcrip- 
tion promotes gencDNA generation both in cell culture and in J20 
neurons in vivo (Figs. 4, 5). This is consistent with the increase in APP 
transcription that was previously linked to SAD incidence*!, and the 
gene encoding the SAD risk factor ApoE4*”. Notably, the dependence 
of gencDNA on reverse transcriptase activity might be relevant to the 
statistical rarity of proven cases of Alzheimer’s disease in individuals 
with HIV infection who are more than 65 years old*?*4, and who have 
received prolonged, combined anti-retroviral therapy (CART), which 
includes reverse transcriptase inhibitors. If confirmed, this observa- 
tion would suggest the immediate use of FDA-approved cARTs or 
modified combinations containing reverse transcriptase inhibitors to 
treat SAD, Down syndrome and perhaps FAD. Additionally, processes 
that produce DNA breaks, such as head injury, that have been linked 
to Alzheimer’s disease**-*’ are consistent with gencDNA production 
requiring DNA breaks. Thus, gencDNAs and their production have 
properties relevant to a range of Alzheimer’s disease mechanisms and 
the development of new therapeutic strategies. 

The presence of APP gencDNAs in non-diseased neurons is likely 
to reflect the normal roles of APP?®, including synaptic function; here, 
APP gencDNAs might provide an increased repertoire of protein spe- 
cies, contributing to synaptic diversity. Additional genes may be tran- 
scriptionally modified and genomically retro-inserted in response to 
selective activities in neuronal populations. Such a mechanism might 
enable preferential gene re-expression that bypasses splicing or further 
RNA modification. More broadly, gencDNAs could provide neurons 
with an activity-dependent mechanism for recording and retaining 
information over long periods of time, perhaps placing multiple forms 
of a gene under transcriptional control distinct from a wild-type locus, 
which could be produced through diverse genomic integration sites 
that remain to be determined. Such a process could have relevance 
to known neuronal functions that depend on transcriptional activity, 
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versus WT+ P= 0.0253, J20+ versus J20— P=0.0371, WT-+ versus 
WT-— P=0.9267, J20— versus WT— P=0.9842). i, j, Area of DISH 617 
foci increased with age in J20 mice. i, Representative images of from mice 
aged 177, 566, 661, 728, 748 and 829 days (one animal each, number of 
nuclei interrogated is listed below box). j, Area of foci shows statistically 
significant increases with age. +, mean; line, median; box, 75th-25th 
percentiles; whiskers, 90th-10th percentiles (non-parametric Kruskal- 
Wallis with Dunn’s multiple comparisons test). *P < 0.05, **P< 0.01, 
*** P< 0.001, ****P < 0.0001. NS, not significant. Detailed P values for b, 
c, e, f, j are listed in Extended Data Figs. 5, 6. Error bars show s.e.m. Scale 
bars, 10 um unless otherwise noted. 


including Hebbian plasticity’’, synaptic wiring”’, learning and mem- 
ory*', and cognition”. Thus, gencDNA production may represent both 
a ‘recording’ and a ‘playback’ mechanism for expressing a symphony 
of variants beyond wild-type gene forms. It would be surprising if APP 
were the only gene to undergo this form of recombination, which might 
influence distinct, normal brain functions as well as contributing to 
brain disorders such as Alzheimer’s disease. 


Online content 

Any methods, additional references, Nature Research reporting summaries, source 
data, statements of data availability and associated accession codes are available at 
https://doi.org/10.1038/s41586-018-0718-6. 
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METHODS 


Human brain tissue and J20 mice. Fresh frozen human brain tissue was provided 
by the University of California San Diego (UCSD) Alzheimer’s Disease Research 
Center (ADRC) and the University of California Irvine (UCI) Institute for Mind 
Impairments and Neurological Disorders (MIND). 

J20 transgenic mice (B6/Cg-Tg(PDGFB-APPSwInd)20Lms/2JMmjax) were 
purchased from The Jackson Laboratory and housed in IACUC approved animal 
facilities in accordance with applicable laws and regulations at Sanford Burnham 
Prebys Medical Discovery Institute. Sex (F/M) and age (days) of mice used for 
experiments are listed: F177, M566, M661, M661, M728, F748, F829, and M861. 
Sample sizes were estimated based upon preliminary data without additional sta- 
tistics. Samples were allocated randomly and in situ hybridization quantification 
was blinded for statistical assessments. 

Nucleus extraction and FANS. Human and mouse brain nuclei were isolated as 
described previously’. For in situ hybridization analyses, isolated nuclei were fixed 
in 1:10 diluted buffered formalin (Fisher Healthcare) for 5 min. Prior to sorting, fixed 
or unfixed nuclei were then labelled with anti-NeuN rabbit monoclonal antibody 
(1:800) (Millipore, Germany) and Alexa Fluor 488 donkey anti-rabbit IgG (1:500) 
(Life Technologies, Carlsbad, CA), and counterstained with propidium iodide 
(PI; 50 jxg/ml) (Sigma, St. Louis, MO). For DNA analyses, RNase A (100 jig/ml) 
was included with all subsequent steps after initial nuclei isolation, including pri- 
mary and secondary antibody incubations. Diploid NeuN-positive and negative 
nuclei were gated by PI and immunofluorescence, and sorted into appropriate 
populations for RT-PCR, genomic DNA PCR, or in situ hybridization. FANS was 
performed on a FACSAria Fusion (BD Biosciences, Franklin Lakes, NJ) or with a 
FACS-Aria II (BD Biosciences, Franklin Lakes, NJ). 

RNA extraction and RT-PCR. All RNA extractions from 50-nuclei populations 
(NeuN-negative and positive) and bulk tissues were performed using Quick-RNA 
MicroPrep (Zymo Research, Irvine, CA) and RNAeasy Mini kits (Qiagen, Valencia, 
CA) according to the manufacturer’s protocols. OneStep Ahead RT-PCR (Qiagen, 
Valencia, CA) was used for RT-PCR with APP 1-18 primer sets (Supplementary 
Table 1) according to the manufacturer’s protocol. Oligo-(dT)29 primer was used to 
prime the cDNA library as indicated. Low annealing stringency PCR was carried 
out with the following thermal cycling steps for 40 cycles: 95°C 15 s, 55°C for 
APP and 52°C for PSEN1 15 s, and 68°C 2.5 min for APP and 2 min for PSEN1. 
Southern blotting. RT-PCR products were run on agarose gel, denatured, and 
transferred to a positively charged nylon membrane. UV-crosslinked membranes 
were incubated with denatured and purified *7P-labelled APP cDNA probes at 
42°C overnight. Blots were washed four times with increasing washing stringency 
and temperature according to established protocols. Images were developed on a 
Typhoon (GE Healthcare Life Sciences) or Fujifilm FLA-5100 phosphorimager. 
DNA extraction and genomic DNA PCR. DNA extraction from isolated neuronal 
nuclei populations was performed via isopropanol precipitation. In brief, nuclei 
were incubated with proteinase K in 550 yl PK buffer (50 mM Tris pH 8.0, 0.1 M 
EDTA, 0.1 M NaCl, 1% SDS) overnight at 55°C. Samples were then treated with 
RNase cocktail enzyme mix (ThermoFisher, Waltham, MA) for 2 h, followed by 
addition of 250 iil saturated NaCl. After centrifugation, supernatant was used for 
DNA precipitation by isopropanol and washed three times with 70% ethanol. 
DNAeasy and QIAamp DNA Mini kits (Qiagen, Valencia, CA) were also used 
according to the manufacturer's instructions. Purified DNA was stored at —20°C 
for future use. High annealing stringency PCR was performed using either the 
FastStart PCR master mix (Sigma, St. Louis, MO) with PCR cycle settings: 95 °C 
30 s, 65°C for 30 s, and 72°C 2.5 min for 40 cycles, or the Platinum SuperFi DNA 
polymerase (ThermoFisher, Waltham, MA) with cycle settings: 98 °C 10 s, 65°C for 
APP and 52°C for PSEN1 for 10 s, and 72°C 1.5 min for APP and 1 min for PSEN1 
for 40 cycles. Primer sequences are listed in Supplementary Table 1. 

DNA in situ hybridization and RNA in situ hybridization. For DISH pretreat- 
ment, sorted nuclei were dried on Plus Gold slides (Fisher Scientific, Pittsburgh, 
PA). Nuclei were then treated with RNase cocktail enzyme mix (RNase A + RNase 
TI, 1:50) (ThermoFisher, Waltham, MA) at 40°C for 60 min, followed by fixation 
in 1:10 dilution buffered formalin at room temperature for 5 min. After being 
washed with distilled water twice, slides were treated with hydrogen peroxide at 
room temperature for 10 min, target retrieval reagent at 95°C for 15 min, followed 
by protease treatment at 40°C for 10 min. Restriction enzyme was applied after 
protease treatment for 2 h as necessary in negative control experiments. DNA was 
then denatured (2x SSC, 70% formamide and 0.1% sodium dodecyl sulfate) at 
80°C for 20 min. After cooling the slides to room temperature, BaseScope probes 
were applied and incubated with nuclei at 40°C overnight. Samples were then pre- 
pared for signal development. For RISH pretreatment, 10 jum fresh frozen human 
tissue sections was fixed in 1:10 dilution buffered formalin on ice for 10 min. 
After washing with PBS twice, tissue sections were placed in serial diluted ethanol 
50%, 70% and 100%, 5 min for each step. Slides were then treated with hydrogen 
peroxide at room temperature for 10 min, followed by protease treatment at room 
temperature for 20 min. BaseScope probes were then incubated with tissue sections 


at 40°C for 2 h. Hydrogen peroxide, 10x target retrieval buffer, proteases, custom 
BaseScope probes (Supplementary Table 1), and BaseScope reagent kit-RED used 
for signal development were all purchased from Advanced Cell Diagnosis (ACD, 
Newark, CA). Duplex BaseScope reagent kit was also purchased from ACD. Nuclei 
and tissue sections were counterstained with haematoxylin. Zeiss AX10 Imager, 
M2 microscope and ZEN2 software were used for image acquisition. Images were 
thresholded, and foci number or size were quantified using Image] for statistical 
analysis. 

Agilent SureSelect hybridization enrichment and sequencing. The method 
is graphically represented in Extended Data Fig. 2i. Nuclei were isolated from 
human frontal cortex, labelled for NeuN, and NeuN-positive nuclei were isolated 
via FANS. Genomic DNA was extracted and fragmented into ~1.2 kb using soni- 
cation (Covaris, Woburn, MA). End repair reactions were performed and Illumina 
sequencing adaptors were ligated to genomic DNA. Library-prepped DNA was 
hybridized with custom Agilent SureSelect probes designed against the entire APP 
locus, including introns. Purified APP-containing genomic DNA sequences were 
then sequenced on an Illumina NextSeq (Illumina, San Diego, CA). Sequences 
were aligned to the human reference genome (GRCh38) using STAR (version 
2.5.3a) with the settings: --outSA Mattributes All --outFilterScoreMinOverLread 
0.8 --outSJfilterCountTotalMin 1 1 1 1. Duplicate reads were marked and removed 
using Picard (version 2.1.1). Reads were then informatically analysed using IGV, 
the UCSC Genome Browser, and a custom imaging pipeline built in R. 

SMRT sequencing. Neuronal genomic DNA was isolated as described above and 
used for APP PCR and nested APP PCR. Platinum SuperFi DNA polymerase with 
100x higher fidelity compared to native Taq (Invitrogen, Platinum SuperFi DNA 
Polymerase) was used under high annealing stringency (98°C 10s, 65°C 10s, and 
72°C 1.5 min, for 30 cycles). An aliquot of the first PCR product was used as a DNA 
template for nested PCR reactions. Multiple PCR reactions were pooled (149 reac- 
tions for Alzheimer’s disease and 244 reactions for non-diseased) and purified by 
DNA Clean and Concentrator-5 (Zymo Research, Irvine, CA) for SMRT sequenc- 
ing library preparation. PCR amplicons were repaired using SMRTbell template 
prep kit version 2.0 (PacBio) and purified using AMPure PB beads (PacBio, Menlo 
Park, CA). Adapters were ligated to DNA to create SMRTbell libraries. Sequencing 
polymerase was annealed and the SMRTbell library was loaded using Magbead 
binding. Raw bam sequencing files were converted to fastq format using the CCS 
algorithm in the SMRTLink software tool kit from PacBio. In CCS, reads were 
included in the fastq file only if 1) there were more than 20 passes of the sequenc- 
ing polymerase over the DNA insert in the zero mode waveguide well and 2) the 
predicted accuracy in SMRTLink was calculated to be greater than 0.9999. These 
cutoffs generated ultra-high accuracy reads, the median Phred score of reads used 
was 9343-46, representing 99.999999% accuracy, with further quality filtering steps 
applied in our informatic analysis. This SMRT sequencing is comparable in fidelity 
to Sanger sequencing**. 

Genomic data analyses with customized bioinformatic pipelines. Novel algo- 
rithms were developed to detect and analyse exon rearrangement in genes of 
interest. The algorithms were specifically designed to analyse long-read sequences 
generated by the Pacific Biosciences Sequel platform. A series of quality control 
(QC) procedures were performed before sequence processing to ensure high quality 
of the reads being analysed. 

Quality control: consensus sequence and read quality. PacBio circular consensus 
sequence (CCS) reads with fewer than 20 passes were filtered out to ensure overall 
sequence quality. Quality score distributions were examined: for APP gene PCR 
enriched sequences, average median read-wide Phred score was 93. In gencDNA 
analyses, we included only reads that met a mean Phred cutoff of >85. 

Quality control: sequencing artefacts. Owing to the intrinsic limits of PacBio SMRT 
sequencing technology, errors in homopolymers (that is, sequence ATTTG could 
be read as ATTTTG or ATTG in addition to ATTTG) are specially handled with 
a method that combines quality score information and reference sequence at the 
beginning of the homopolymer. The FASTQ files encoded uncertainty in the 
homopolymer run length in the first Phred score of each run. If this Phred score 
was lower than our threshold of 30, then this position was marked as a likely 
sequencing artefact and not a real variant. 

PCR primer filter. The reads were checked to ensure the correct start and end sites 
with forward and reverse PCR primer sequences. BLAST (command line tool 
‘blastn’ 2.6.0+) was used to align primer sequences in either orientation to each 
read with word size 13, gap open penalty 0, and gap extension penalty 2. Any read 
in which both primers were not detected was filtered out. Furthermore, reads on 
the negative strand were reverse complemented in this step. BLAST seed length 
was optimized to avoid ambiguity and ensure sensitivity. 

Alignment to APP reference sequences. The Ensembl reference sequence for APP was 
downloaded from the GRCh38 reference human genome assembly using the UCSC 
Genome Browser (http://genome.ucsc.edu/cgi-bin/hgGateway) with RefSeq acces- 
sion number NM_000484.3. Because the PCR primers begin at the start codon 
and end with the stop codon, sequences of exons 1 and 18 were trimmed to these 
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positions so that only the coding sequence of each of the 18 exons was kept and 
stored as a FASTA file. Then, we used BLAST to look for local alignment between 
18 exons and each quality-filtered CCS read; blastn parameters used: -outfmt 6, 
-wordsize 25, -gapopen 0, -gapextend 2. We used the resulting alignment coordi- 
nates to mark regions of each read covered by exons. This allowed us to analyse 
exon arrangements, lengths and patterns of exon-exon junctions. 

SNV and INDEL analysis. First, we used reference sequences of APP exons to 
replace low quality individual nucleotides (potential homopolymer runs and other 
errors) within each read with their reference APP exon counterpart. Then, we 
analysed BLAST local alignments between each exon (or part of an exon) and the 
read sequence, nucleotide by nucleotide, to look for alignment mismatches. If the 
mismatch position was a different nucleotide, we assigned it as a single nucleotide 
variant (SNV); if the mismatch position was a hyphen in the exon sequence, we 
assigned it as an insertion and if the mismatch position was a hyphen in the read 
sequence we assigned it as a deletion. 

gencDNA production in culture. The method is graphically represented in Fig. 4a. 
CHO cells were serum-deprived for 2 days, followed by addition of reverse tran- 
scriptase inhibitors AZT (100 1M) and ABC (10 41M) (Tocris, Minneapolis, MN) 
until the end of the experiment. The medium was changed daily with fresh reverse 
transcriptase inhibitors. Cells were transfected with APP-751 driven by the CAG 
promoter by GenJet (SignaGen Laboratories, Gaithersburg, MD) on day 3, then 
on day 4, cells were treated with 0 1M, 5 1M, or 50 1M hydrogen peroxide (Fisher 
Scientific) for 2 h. After 1 day, cells were collected and genomic DNA was extracted 
for PCR analysis. 

In vitro reverse transcriptase activity assay. Lysates were prepared in reverse 
transcriptase disruption buffer‘”, and contained cOmplete, EDTA-free protease 
inhibitor cocktail (Sigma-Aldrich, St. Louis, MO) and PhosSTOP phosphatase 
inhibitors (Sigma-Aldrich, St. Louis, MO). The assay was performed essentially 
as described**, except the assay was separated into two parts. One microgram of 
extract was used in the reverse transcription step of the assay. In addition, Primer 
A was used in the reverse transcription reaction cocktail instead of Primer B 
(Supplementary Information1). Reverse transcription was carried out at 37°C for 
45 min, followed by 15 min at 70°C. 

The reverse transcriptase product of this first step was assayed in triplicate by 
quantitative PCR. Levels of reverse transcription activity were determined by the 
Delta Cq method, compared to the activity in negative controls (water and no 
nucleotides), which were given Cq scores of 40. 100,000 picounits of SuperScript 
II Reverse Transcriptase (ThermoFisher Scientific) were used as a positive control 
for the assay. 

Lysates for heat inactivation experiments were incubated for 15 min at 70°C 
before the reverse transcription step. For inhibitor experiments, lysates were incu- 
bated with inhibitor in the presence of all the components of the reaction except for 
dNTPs. After 10 min at room temperature, dNTPs were added and the reaction was 
incubated at 37°C as above. AZT-TP was purchased from TriLink Biotechnologies 
(San Diego, CA). 

Construction and retroviral transduction of synthetic human APP sequence 
targets. Phosphorylated oligonucleotides (Integrated DNA Technologies) com- 
posed of human APP target sequences with BamHI and Bglll restriction sites 
on the 5’ ends were annealed and ligated into the BamHI site of the retroviral 
expression vector S-003-AB LZRSpBMN-linker-IRES-EGFP. All primer sequences 
for construction are listed in Supplementary Table 1. Single and concatamerized 
oligonucleotide inserts were identified by PCR using primers flanking the BamHI 
insertion site, and identified clones were sequenced to confirm insert copy number 
(Genewiz, La Jolla, CA). Helper-free ecotropic virus was produced by transfecting 
DNA constructs (Lipofectamine 2000, Thermo Fisher Scientific) with single or 
multiple copies of the oligonucleotide inserts into the retrovirus packaging line 
Phoenix-ECO. Forty-eight hours after transfection, retroviral supernatants were 
harvested, and 2 ml of selected virus was used for transduction of NIH-3T3 cells 
in 6-well plates. Retroviral transduction was carried out by removing the cell 
growth medium, replacing it with 2 ml retroviral supernatant containing 4 |.g/ml 
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polybrene, and spinning at 25°C for 1 h at 2,800 r.p.m. Forty-eight hours after 
transduction, the percentage of GFP* cells, as identified by flow cytometry, was 
used to evaluate transduction efficiency. 

Cell culture. NIH-3T3 (DMEM, 5% FBS), CHO-K1 (RPMI, 10% FBS), SH-SY5Y 
(DMEM/F12, 10% FBS), IMR-90 (DMEM, 10% FBS) and HEK-293 (DMEM, 10% 
FBS) cells were purchased from ATCC and maintained at 37°C under 5% COp. 
Although HEK-293 is in the database of commonly misidentified cell lines, this 
cell line was used to show protein expression of our targets in mammalian cells 
and as gencDNA negative controls. For the indicated purpose, there is no concern 
regarding this cell line in the manuscript. Cells from ATCC and all reagents used 
were verified to be mycoplasma free. 

Western blot. Cells were harvested in RIPA buffer (100 mM Tris-HCl, pH 7.6, 
250 mM NaCl, 0.1% sodium dodecyl sulfate, 0.2% deoxycholic acid, 0.5 mM dith- 
iothreitol, 1 mM EDTA, 0.5% NP-40 and 1% Triton X-100), and proteins in each 
lysate were analysed using rat monoclonal anti-HA antibody (Clone 3F10, Roche) 
and horseradish peroxidase-conjugated goat anti-rat secondary antibody (Cell 
Signaling). Enhanced chemiluminescent substrate (Millipore) for the reaction was 
added, and the signal was detected by BioRad bioimaging system. 

Variant toxicity assay. SH-SY5Y cells were transfected with APP RNA variants 
by lipofectamine LTX (Life Technologies) overnight and further cultured under 
serum-deprived conditions for 7 days. Cell viability was determined by WST1 
reagent (Roche Applied Science) according to the manufacturer’s protocol. In 
brief, cells in 96-well plates were incubated with 100 sl WST1 reagent and culture 
medium in a ratio of 1:10 (v/v) per well at 37°C for 2 h. The absorbance at 440 nm 
of samples normalized by a background control was measured by microplate ELISA 
reader. 

Statistics and reproducibility. All statistical analyses were completed using Prism 
Version 7 (GraphPad). Specific tests, number of data points (n), and P values are 
reported in Figures, figure legends or Extended Data Figures. All experiments in 
Figures and Extended Data Figures were repeated at least three times (independent 
experiments) unless specified otherwise in the figure legends. 

Reporting summary. Further information on research design is available in 
the Nature Research Reporting Summary linked to this paper. 


Data availability 

Fastq files of SMRT sequences performed on a PacBio Sequel and Illumina 
sequences on NextSeq500 have been deposited in NCBI Sequence Read Archive 
(BioProject ID: PRJNA493258). The PacBio produced RNA-seq data sets from 
whole brain and temporal lobe supporting the findings of this study are available 
at https://www.pacb.com/blog/data-release-alzheimer-brain-isoform-sequencing- 
iso-seq-dataset, and from the authors upon reasonable request and with permission 
of PacBio, respectively. The source codes of the customized algorithms are available 
on GitHub (https://github.com/christine-liu/exonjunction and https://github.com/ 
taolonglab/varccs). 
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Extended Data Fig. 1 | RT-PCR on bulk and sorted nuclei, and 

RISH. a, RT-PCR products from bulk brain tissue samples from three 
individuals with SAD and three without. Canonical APP splice variants 
and non-APP products were identified. b, Representative gels showing the 
presence of canonical APP splice variants (red arrows, n = 2 independent 
experiments). c, No APP variants were identified in NeuN-negative nuclei 
from individuals with or without SAD. The 18S rRNA control verified the 
presence of RNA. Novel APP RNA variants were identified from oligo-dT 
primed cDNA libraries from 50-cell populations of neuronal nuclei 


18S rRNA 


(d, n=3 biological replicates) and brains from individuals with 
Alzheimer’s disease (e, commercially produced PacBio cDNA libraries). 

f, RISH3/16 signal from antisense probes showed cytoplasmic distribution 
of APP 3/16 RNA. Negative control sense probes and a probe targeting the 
bacterial gene DapB showed no signal. g, PSEN1 RT-PCR on populations 
of 50 nuclei from the brains of three individuals with SAD and three 
without showed no PSEN1 RNA variants. The positive control (PC) is 
amplified from RNA extracted from bulk brain tissue. 18S rRNA control 
verified the presence of RNA. 
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Extended Data Fig. 2 | APP gencDNA detection by genomic DNA PCR, 
DISH, and targeted genomic pull-down. a, Duplicate gel from Fig. 2b, 
with more sensitive thresholds to show the clear absence of PSEN1 bands. 
b, Nested PCR was used with alternative APP primers (three total sets: 
APP 1-18, APP 1-18N, and APP 2-17). c, Cloning and Sanger sequencing 
of indicated bands (red numbers in b) revealed novel APP gencDNAs (see 
Fig. 1 for legend and nomenclature). d, APP 1-18 DNA PCR showed no 
products in non-neuronal cell types: IMR-90 (human lung fibroblast), 
HEK (human embryonic kidney) and non-neuronal (NeuN-negative) 
genomic brain DNA from individuals with and without SAD. RNaseP 
was used as a positive control. e, APP mRNA is expressed in HEK-293 


Ex9 Ex10 Ex11 Ex12 Ex13 Ex14 Ex15 al Ex17 Ex18 
In8 InQ Int0 In11 In12 In13 Ini4 IntS  Inté In17 


and IMR-90 cells; 18S rRNA used as a positive control. f, Digestion 

with the off-target restriction enzyme Xbal did not affect DISH3/16 or 
DISH 6/17 signals. g, h, Synthetic DNA containing 16/17 (g) or 3/16 (h) 
target sequences (target), or wild-type human genomic APP sequences 
lacking IEJs and exon-exon junctions (mutant target) were introduced 

by retroviral transduction into NIH-3T3 cells. DISHj6/17 and DISH3/16 
signals from both sense and antisense probes were detected only in target 
infected cells. i, Schematic of Agilent SureSelect targeted DNA pull-down. 
j, Agilent SureSelect hybridization enrichment targeted the entire genomic 
locus of APP and showed unbiased sequencing depth across the full 
genomic locus. Exons and introns are shown on two scales. 
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Extended Data Fig. 3 | APP gencDNA reading frame analysis. a, Colour key for all gencDNAs with junctions identified by SMRT sequencing. 
b, c, Percentage of unique in-frame reads from brains of individual with SAD (b; 6,299 unique reads) or without SAD (c; 1,084 unique reads). 
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Extended Data Fig. 4 | APP gencDNA and RNA variant formation in 
CHO cells. a, Time line of CHO cell experiments modified from Fig. 4a. 


After transfection and gencDNA induction, serum was added and 
CHO cell cultures were passaged for 7 days. Cells were harvested, and 


DNA and RNA were extracted for analyses. b, c, PCR of genomic DNA 
(b; gDNA) and RT-PCR with APP 1 and 18 primers (c; n =2 independent 
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experiments). Note that APP plasmid is no longer detected (compare to 
Fig. 4b). DNA breaks during cell proliferation might contribute to variant 
formation in cells without DNA damage (no H,O,). Reverse transcriptase 
inhibitor (RTi, AZT + ABC) treatment prevents formation of APP RNA 
variants, indicating the dependence of RNA variants on gencDNAs. 

d, Induced APP variants with IEJs observed in b, c. 
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Extended Data Fig. 5 | See next page for caption. 
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Extended Data Fig. 5 | Data from six individual brains for each brain 
from individuals with or without SAD represented as averages in 

Fig. 5, and variant cytotoxicity. a, d, Nuclei sorted from cortices of six 
individuals with SAD and six without were analysed by DISH}6/17 (a) 

and DISH3)/1. (d), Cumulative frequency distribution plots and average 
numbers of foci per nucleus show statistical significance (non-parametric 
Kruskal-Wallis test with Dunn’s correction for multiple comparisons) 
between all paired brain sets. Numbers above bars indicate number of 
nuclei analysed. NS, not significant. Error bars show s.e.m. b, ¢, e, f, 
Detailed P values for Fig. 5b (b), Fig. 5c (c), Fig. 5e (e) and Fig. 5f (f). 
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g, APP-751, three coding and one non-coding APP variant in constructs 
containing haemagglutinin (HA) tags were transfected into HEK-293 

cells. Cell lysates from all three coding variants and full-length APP-751 
displayed protein products of the expected size by western blot. a-tubulin 
was used as a loading control. h, Three coding APP variants were 
transfected into SH-SY5Y cells individually and cell viability was measured 
by WST-1 seven days after transfection under serum-deprived conditions. 
Means of three independent experiments were analysed using ordinary 
one-way ANOVA with uncorrected Fisher’s LSD for multiple comparisons 
(*P = 0.0477, ****P < 0.0001). 
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Extended Data Fig. 6 | DISH3/1¢ and DISHj6,/17 data analyses. a, DISH3/16 
data from individual J20 and wild-type mouse cortices represented as 

an average in Fig. 5h; numbers above bars represent number of nuclei 
analysed. b, No DISHj6/17 signal was detected in wild-type mouse nuclei. 
c, Detailed statistical significance of DISHj6/17 signal across all mice in 

Fig. 5j (non-parametric Kruskal-Wallis with Dunn’s multiple comparisons 
test). **P< 0.01, ***P < 0.001, ****P < 0.0001. NS, not significant. d-f, 
Synthetic DNA targets containing the exon 16/17 junction sequence were 


% of Nuclei with 3/16 Foci 


10 
c Comparison Group Summary P-value 
8 
177+ vs. 177- ns___>0.9999 
6 Experiment 3 566+ vs. 566-7" <0.0001 
Positive 61+ vs.661- ** 0.0002 
4 negative 728+ vs.728- *** _ <0.0001 
748+ vs.748-**** _<0.0001 
2 829+ vs. 829-**** _<0.0001 
177+ vs. 566+ **** ~~ <0.0001 
2 J20 WT J20. WT 177+ vs.661+ **** — <0.0001 
~NeuN+  NeuN- 177+ vs. 728+ ****__ <0.0001 
177+ vs. 748+ ****  <0.0001 
177+ vs. 829+ ****  <0.0001 
566+ vs.661+ ns >0.9999 
Positive 566+vs.728+  *** 0.0007 
VS. 566+ vs. 748+ ns 0.2055 
Positive 
566+ vs. 829+ bial 0.0082 
661+ vs. 728+ ****  <0.0001 
661+ vs. 748+  *** 0.0009 
661+ vs.829+ **** — <0.0001 
728+ vs. 748+ ns >0.9999 
728+ vs. 829+ ns >0.9999 
748+ vs. 829+ ns _>0.9999 
177- vs. 566- ns >0.9999) 
177- vs. 661- ns >0.9999 
177- vs. 728- ns 0.5971 
177- vs. 748- ns 0.0561 
177- vs. 829- i 0.0026 
566- vs. 661- ns —>0.9999) 
566- vs. 728- ns >0.9999 
Negative 566-vs.748- ns ___>0.9999 
negative 506-v8.829- ns _>0.9999 
661- vs. 728- ns >0.9999) 
661- vs. 748- ns 0.8495 
661- vs. 829- ns 0.1125 
728- vs. 748- ns >0.9999 
728- vs. 829- ns >0.9999) 
748- vs. 829- ns >0.9999) 


introduced by retroviral transduction into NIH-3T3 cells, and the target 
sequence (provirus) identified by DISH 6/17. A concatamer (x2) showed 
increased focus size, represented as a cumulative frequency distribution 
plot (e) and a box and whisker plot (f). Line, median; box, 75th-25th 
percentiles; whiskers, 90th-10th percentiles. Statistical significance 

was calculated using non-parametric Kruskal-Wallis test with Dunn’s 


correction for multiple comparisons. ****P < 0.0001. 
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Extended Data Table 1 | Nine distinct experimental approaches supporting APP recombination 


Method 


RT-PCR and 
Sanger sequencing 


2 RISH on IEJ 3/16 


3 Whole transcriptome 
SMRT sequencing 


Targeted RNA 
SMRT sequencing 


PCR and Sanger 
sequencing 


PCR and SMRT 
sequencing 


Sequencing of APP 
genomic locus, 
pulled down by 

Agilent SureSelect 


8 DISH on gencDNAs 


APP751 
over-expression in 
9 CHO cells with 
H2O2 treatment 


Tested Material 


Nuclear RNA from human 
cerebral cortical neurons 


Human SAD tissue sections 


Human AD brain RNA 
prepared by PacBio 


Human temporal lobe RNA, 
pulldown of APP related genes 
by PacBio 


Genomic DNA from SAD and 
non-diseased cerebral cortical 
neurons and non-neurons 


Genomic DNA from SAD and 
non-diseased cerebral cortical 
neurons 


Genomic DNA from SAD 
cerebral cortical neurons 


SAD and non-diseased 
cerebral cortical neuronal 
nuclei 


J20 and WT cerebral cortical 
neuronal vs. non-neuronal 
nuclei 


J20 and WT cerebral cortical 
neuronal vs. non-neuronal 
nuclei 


CHO cells 


Unit Size of 
Sample 


50-nuclei 


Tissue 
sections 


Bulk RNA 


Bulk RNA 


DNA 
equivalent to 
20-nuclei 


Small nuclei 
populations 
(20-1000) 


200 ng of DNA 


Single nuclei 


Bulk DNA 


Reproducibility 


(1) Multiple brains 
(2) Multiple RT-PCRs 
(3) Multiple primers 
(4) Multiple investigators 


(1) Multiple sections 
(2) Multiple investigators 


Multiple scientists using 
independent approaches 
identified IEJ variants 


Multiple scientists using 
independent approaches 
identified IEJ variants 


(1) Multiple brains 
(2) Multiple PCRs 
(3) Multiple primers 
(4) Multiple polymerases 
(5) Multiple investigators 


(1) Multiple brains 
(2) Multiple sequencing 
runs 


Two complete sets of pull- 
down and sequencing 


(1) Multiple investigators 
(2) Multiple brains 
(3) Multiple sense probes 


(1) Multiple experiments 
(2) Multiple investigators 


Result 


Novel APP RNA variants 
with IEJs from multiple brains 
and experiments 


Cytoplasmic and mosaic 
IEJ 3/16 signals on human 
brain tissue 


Novel APP RNA variants 
with IEJs 


Novel APP RNA variants 
with IEJs 


Conserved, 8 gencDNAs 
including the same IEJs 
in genomic DNA 
as found in RNAs 


At least 6,299 unique APP 
gencDNaA variants from AD 
brain; 1,084 from non-diseased 
brain 


APP gencDNAs 


Up to 13 spatially distinct APP 
gencDNaA loci in single 
neuronal nucleus, enriched in 
AD and rare in 
non-neuronal nuclei 


Increased APP IEJ 3/16 
gencDNaA signal in J20 
neurons, but not 
non-neuronal nuclei 


Age-related increases in 
APP Ex 16/17 foci diameter in 
J20 neuronal but not 
non-neuronal nuclei 


APP gencDNA with 
IEJs identified, reverse 
transcriptase activity and 
DNA breaks dependent 


Evidence of APP recombination reported in this paper is summarized with experimental methods, materials, unit sizes of sample and reproducibility. 
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Extended Data Table 2 | Human postmortem brain information 


Brain Braak Sex PMI Age 
Namie (Hours) (years) 
SAD-1 6 F 6 88 
SAD-2 6 F 12 88 
SAD-3 6 F 6 84 
SAD-4 6 F 86 
SAD-5 6 M 5 83 
SAD-6 6 F 10 72 
SAD-7 5 F 3. 77 
ND-1 1 M U 87 
ND-2 1 F 72 83 
ND-3 U M U 83 
ND-4 1 F 12 80 
ND-3 1 F 18 93 
ND-6 2 M 12 94 


F, female; M, male; U, unknown; PMI, post mortem interval. All brains were from the pre-frontal cortex and obtained from the University of California San Diego Alzheimer’s Disease Research Center 
and the University of California Irvine Institute for Mind Impairments and Neurological Disorders. 


© 2018 Springer Nature Limited. All rights reserved. 


Extended Data Table 3 | APP variants information 


ARTICLE 


Name RNA DNA Coding or Start Break Break End Sanger Sequence Homology # of bpin # of mis- 
PCR PCR___Non-coding (bp) Start End homology __ matches 
RT-PCR Identified Variants 
R1_11.1 a4 ¥ Non-Coding 1 32 1431 2313 CACTGCTCTGCAGGC 15 3 
R1_11.2 ¥ Non-Coding 1 44 1456 2313 CGGC 4 0 
R1_14 4 ry Non-Coding 1 46 1814 2313 AGCTC 5 1 
R2_14 Y Non-Coding 1 200 1749 2313 ACCAAGGA 8 0 
R2_16 Y Non-Coding 1 216 2015 2313 AT 2 0 
R2_17 v Y Non-Coding 4 64 2102 2313 CA 2 0 
R2_18 ¥ ry Coding 1 211 2267 2313 GC 2 0 
R3_14 rv ry: Coding 1 267 1890 2313 AGCCAAC iG 0 
R3_16 Y Y Coding 1 251 2008 2313 AA 2 0 
R3_17 Y ¥ Non-Coding 1 314 2123 2313 GCAGTG 6 0 
R6_17 Y Coding 1 673 2079 2313 AGATGGGAGTGAAGACAAAG 20 0 
R6_18 ¥ Coding 1 740 2233 2313 GAGGA 5 0 
DNA PCR Identified Variants 
D2_18 ¥; Coding 1 120 2287 2310 N/A N/A N/A 
D1_17 ¥ Non-Coding 18 51 2159 2285 N/A N/A N/A 
D2_16 ry; Non-Coding 19 209 2016 2285 TGCAGAATT 9 1 
D2_17 Y Non-Coding 18 64 2102 2285 CA 2 0 
D2_16.2 Y Non-Coding 157 209 2016 2095 TGCAGAATT. 9 1 
Commercially available PacBio RNA-Seq 
P3_9 ¥ n/a Non-coding 303 345 1093 +853 CCTAC 5 0 
P6_12 Y n/a Non-coding -A1 724 1483 +853 AAGAAG 6 0 
P6_18 Y n/a Non-coding -111 2029 2274 _—«+936 AATTCCGAC 9 0 
DNA PCR on CHO cells: Induced Variants 
iD1_17 n/a iy Non-Coding 1 51 2159 2313 N/A N/A N/A 
iD2_13 n/a y Coding 1 170 1626 2313 N/A N/A N/A 
iD4_15 n/a Y Missense 1 434 1920 2313 TGA 3 1 
iD6_18 n/a ¥: Coding 1 705 2269 2313 i 1 0 
iD2_17 n/a ¥ Non-Coding 1 64 2102-2313 CA 2 0 
RNA PCR on CHO cells: Induced Variants 
iR2/3 Y n/a Non-Coding 1 64 319 2313 ACCCA 5 0 
iR5/15 Y n/a Non-Coding 1 618 1921 2313 GATGACTCG 9 2 
iR2/4_ ¥ n/a Non-Coding 1 197/ 399/ 2313 CAA/GA 3/2 0/0 
717 934 2077 
iR3/13 ¥ nla Non-Coding 1 318 1682 2313 AAG 3 0 
iR4/18 Y n/a Missense 1 397 2285 2313 ACAAGT 6 0 


Detailed information on identified APP RNA and DNA variants. 
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Extended Data Table 4 | DISH and RISH experiments and validation list 


Junction Target Sample Type Probes Figure Panel 
Exp Sense es oe 
Human nuclei +RNase 19. 98,0,6 
Exp Antisense Fig. 2d,f,g 
Human nuclei +RNase + i. Bi 
restriction enzyme (MluCl) Neg Sane Fig: 2h 
Human nuclei +RNase +off- ‘ 
target restriction enzyme (Xbal) he il anrige 
16/17 DNA 
‘ Pos Sense ED Fig. 2g 
Peewee Pos Antisense ED Fig. 2g 
‘ Neg Sense ED Fig. 2g 
Synthetic mutant target Ne. Antisense ED Fig. 2 
Synthetic target concatamer Pos Sense ED Fig. 6d-f 
: Fig. 5i,j; 
WT mouse nuclei +RNase Neg Sense ED Fig. 6b.c 
: Fig. 5i,j; 
J20 mouse nuclei +RNase Exp Sense ED Fig. 6b,c 
Fig. 2h,e,i,n; 
Exp Sense fe eee 
Human nuclei +RNase Fig. 5d,e,f 
Exp Antisense Fig. 2h,e,i 
Human nuclei +RNase + Neg Sarge Fig. 21m 


restriction enzyme (PSTI & Msll) 


Human nuclei +RNase +off- ; 
DNA target restriction enzyme (Xbal) Pos Sense ED Fig. 2f 


3/16 7 Pos Sense ED Fig. 2h 
ik Pos __ Antisense ED Fig. 2h 
‘ Neg Sense ED Fig. 2h 
Synthetic mutant target Neg Antisense ED Fig. 2h 
WT mouse nuclei +RNase Exp Sense Fig. 5g,h; 
ED Fig. 6a 
; Fig. 5g,h; 
J20 mouse nuclei +RNase Exp Sense ED Fig. 6a 
Neg Sense ED Fig. 1f 
RNA SAD tissue Exp Antisense ED Fig. 1f 
Neg DapB ED Fig. 1f 
In2/Ex3 DNA Human nuclei +RNase Exp Sense Fig. 2n 


In, intron; Ex, exon; Exp, experimental; Neg, negative control; Pos, positive control; ED, Extended Data. DNA and RNA in situ hybridization experiments, positive and negative controls are summarized. 
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Predictable and precise template-free 
CRISPR editing of pathogenic variants 


Max W. Shen!*!?, Mandana Arbab*+>-, Jonathan Y. Hsu’, Daniel Worstell®, Sannie J. Culbertson®, Olga Krabbe®?’, 
Christopher A. Cassa®!°, David R. Liu?-+>*, David K. Gifford?®!0* & Richard I. Sherwood®:** 


Following Cas9 cleavage, DNA repair without a donor template is generally considered stochastic, heterogeneous and 
impractical beyond gene disruption. Here, we show that template-free Cas9 editing is predictable and capable of precise 
repair to a predicted genotype, enabling correction of disease-associated mutations in humans. We constructed a library 
of 2,000 Cas9 guide RNAs paired with DNA target sites and trained inDelphi, a machine learning model that predicts 
genotypes and frequencies of 1- to 60-base-pair deletions and 1-base-pair insertions with high accuracy (r=0.87) in 
five human and mouse cell lines. inDelphi predicts that 5-11°%% of Cas9 guide RNAs targeting the human genome are 
‘precise-50’, yielding a single genotype comprising greater than or equal to 50% of all major editing products. We 
experimentally confirmed precise-50 insertions and deletions in 195 human disease -relevant alleles, including correction 
in primary patient-derived fibroblasts of pathogenic alleles to wild-type genotype for Hermansky-Pudlak syndrome 
and Menkes disease. This study establishes an approach for precise, template-free genome editing. 


Clustered regularly interspaced short palindromic repeats (CRISPR)- 
Cas9 has revolutionized genome editing, providing powerful research 
tools and promising agents for the potential treatment of genetic dis- 
eases’ *. The DNA-targeting capabilities of Cas9 have been improved 
by the development of guide RNA (gRNA) design principles*, mod- 
elling of factors leading to off-target DNA cleavage, enhancement of 
Cas9 sequence fidelity by modifications to the nuclease and gRNA, 
and the evolution or engineering of Cas9 variants with alternative 
PAM sequences>. Similarly, control over the product distribution of 
genome editing has been advanced by the development of base edit- 
ing to achieve precise and efficient single-nucleotide mutations®”’, 
and the improvement of template-directed homology-directed repair 
(HDR) of double-stranded breaks®. Despite these developments, base 
editing does not mediate insertions or deletions, and HDR is limited 
by low efficiency, particularly in non-dividing cells, and by undesired 
by-products. As many human genetic variants associated with disease 
arise from insertions and deletions” !", methods to efficiently introduce 
insertions and deletions to alleviate pathogenic mutations in a predict- 
able manner with a major single-genotype outcome would advance the 
field of genome editing. 

Non-homologous end joining (NHEJ) and microhomology- 
mediated end joining (MME)J) processes are major pathways involved 
in the repair of Cas9-mediated double-stranded breaks that can result 
in highly heterogeneous repair outcomes comprising hundreds of 
repair genotypes. Although end-joining repair of Cas9-mediated dou- 
ble-stranded DNA breaks has been harnessed to facilitate knock-in of 
DNA templates'!”” or deletion of intervening sequence between two 
cleavage sites’, NHEJ and MMEJ are not generally considered useful for 
precision genome editing applications. Previous work has found that 
the heterogeneous distribution of Cas9-mediated editing products at a 


given target site is reproducible and dependent on local sequence con- 
text'>!4, but no general methods have been described to predict geno- 
typic products following Cas9-induced double-stranded DNA breaks. 

In this study, we developed a high-throughput Streptococcus pyo- 
genes Cas9 (SpCas9)-mediated repair outcome assay to characterize 
end-joining repair products at Cas9-induced double-stranded breaks 
using 1,872 target sites based on sequence characteristics of the human 
genome. We used the resulting rich set of repair product data to train 
inDelphi, a machine learning algorithm that accurately predicts the 
frequencies of the substantial majority of template-free Cas9-induced 
insertion and deletion events at single-base resolution (https:// 
indelphi.giffordlab.mit.edu/). We find that, in contrast to the notion 
that end-joining repair is heterogeneous, inDelphi identifies that 5-11% 
of SpCas9 gRNAs in the human genome induce a single predictable 
repair genotype in >50% of editing products. Building on this idea 
of precision gRNAs, we used inDelphi to design 14 gRNAs for high- 
precision template-free editing yielding predictable 1-bp insertion genotypes 
in endogenous human disease-relevant loci and experimentally 
confirmed highly precise editing (median 61% among edited products) 
in two human cell lines. We used inDelphi to reveal human pathogenic 
alleles that are candidates for efficient and precise template-free gain- 
of-function genotypic correction and achieved template-free correc- 
tion of 183 pathogenic human microduplication alleles to the wild-type 
genotype in >50% of all editing products. Finally, we integrate these 
developments to achieve high-precision correction of five pathogenic 
low-density lipoprotein receptor (LDLR) microduplication alleles in 
human and mouse cells, as well as correction of endogenous pathogenic 
microduplication alleles for Hermansky—Pudlak syndrome (HPS1) 
and Menkes disease (ATP7A) to the wild-type sequence in primary 
patient-derived fibroblasts. 
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a 
Library of 1,872 gRNAs and corresponding target sites 
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x y 


Fig. 1 | High-throughput assaying of Cas9-mediated DNA repair 
products supports the design of the inDelphi model. a, A high- 
throughput genome-integrated library for assaying Cas9 editing products. 
b, Categories of editing products at 1,996 lib-A target sites in mESCs. 

c, Categories of editing products in 89 VO endogenous target sites in 
HEK293 cells. d, Mechanism of microhomology-mediated end-joining 
repair. e, inDelphi uses machine learning to predict the frequencies of 
editing products from target DNA sequence (selected outcomes depicted 
in table). Major editing outcomes include +1 to —60 indels. 


Template-free Cas9 editing is predictable 

To capture Cas9-mediated end-joining repair products across a wide 
variety of target sites, we designed a genome-integrated gRNA and tar- 
get library screen in which many unique gRNAs are paired with 55-bp 
target sites containing a single canonical ‘NGG’ SpCas9 protospacer- 
adjacent motif (PAM) that directs cleavage to the centre of each 
target site (Fig. la). Previously reported repair products at 90 loci in 
three human cell lines! (HCT116, K562, and HEK293; we refer to the 
collective data set as VO) showed that 94% of endogenous cut-site pro- 
ximal Cas9-mediated deletions are <30 bp (Extended Data Fig. 1), sug- 
gesting that our assay can assess the vast majority of cut-site-proximal 
editing products. To explore repair products among sequences repre- 
sentative of the human genome, we designed 1,872 target sites spanning 
the distributions of GC content, number of nucleotides participating 
in microhomology, predicted Cas9 on-target cutting efficiency’, and 
estimated precision of deletions’ (Supplementary Methods, Extended 
Data Fig. 1), as well as 90 VO target sites to create a library (denoted 
hereafter as ‘lib-A’) in which each target site is accompanied by a cor- 
responding gRNA on the same DNA molecule. Through a multi-step 
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process (Extended Data Fig. 1), we constructed and cloned lib-A into 
a plasmid backbone allowing Tol2 transposon-based integration into 
the genome’, gRNA expression and hygromycin selection for cells 
with library members. 

We stably integrated lib-A into the genomes of mouse embryonic 
stem cells (mESCs) and human U20S cells, then targeted these cells 
with a Tol2 transposon-based SpCas9 expression plasmid contain- 
ing a blasticidin expression cassette and selected for cells with stable 
Cas9 expression while maintaining more than 2,000-fold coverage of 
the library. After one week, we collected genomic DNA from these 
cells (three independent biological replicates in mESCs, two in 
U20OS cells) along with control cells not treated with Cas9 (one in 
each) and performed paired-end high-throughput DNA sequencing 
(HTS) to reveal the distribution of cut-site-proximal repair products 
at each target site (Extended Data Fig. 1). We tabulated the resulting 
192,055,534 sequencing reads using a sequence alignment procedure 
(Supplementary Methods) that identified an average of 245 unique 
repair outcomes with high confidence (Supplementary Methods) per 
target site in mESCs (45 in U2OS cells) after adjusting with control 
data. Repair outcomes in experimental replicates within the same cell 
type were consistent (median r= 0.89 in mESCs, 0.77 in U2OS cells, 
Extended Data Fig. 1). 

In lib-A data from mESCs and U20S cells as well as in endogenous 
data in HEK293, K562 and HCT116 cells, end-joining repair of Cas9- 
mediated double-stranded breaks primarily caused deletions (on 
average 63-87% of all edited products across cell types) and insertions 
(13-37% of all products) (Fig. 1b, c, Extended Data Fig. 2). A large frac- 
tion of products were deletions containing microhomology consistent 
with MME] (39-58% of all products, 62-75% of deletions, Fig. 1b-d, 
Extended Data Fig. 2, Supplementary Discussion). Three repair classes 
constituted 80-95% of all observed editing products (Fig. 1b, c): micro- 
homology (MH) deletions, microhomology-less (MH-less) deletions, 
and single-base (1-bp) insertions; we define these three repair classes 
as constituting all major editing outcomes. The insertion and deletion 
(indel) frequencies at 86 target sites were consistent between endoge- 
nous data in HEK293, K562 and HCT116 cells and lib-A data in mESCs 
and U2OS cells (median r=0.65 to 0.82 for pairs of cell types when 
adjusting for 1-bp insertion frequencies, median r=0.52 to 0.76 with- 
out adjustment, Extended Data Fig. 1). Together, these data confirm 
that Cas9-mediated editing products from our library assay reflect pre- 
viously reported endogenous editing in human cells. 

Using lib-A, we designed a new machine learning model, inDelphi, to 
predict the frequency of all major editing outcomes at any given target 
site. This model consists of three interconnected modules aimed at pre- 
dicting MH deletions, MH-less deletions and 1-bp insertions (Fig. le). 

inDelphi predicts MH deletions using a module that simulates the 
MMEJ repair mechanism, in which 5/3’ end resection at a double- 
stranded break reveals two 3’ single-stranded DNA overhangs that 
can anneal through sequence microhomology. Extraneous sin- 
gle-stranded DNA overhangs are eliminated, and DNA synthesis and 
ligation generates a double-stranded DNA repair product'® (Fig. 1d). 
Through this mechanism, each microhomology results in a distinct 
deletion genotype (Fig. 1d, Supplementary Discussion). inDelphi 
assigns a score (phi) to a candidate microhomology based on a neural- 
network-learned score using its length and GC content with a penalty 
based on the deletion length. Relative frequencies are obtained by nor- 
malizing the phi scores of microhomologies of interest to sum to one, 
thereby modelling MH deletions as a competitive process. 

inDelphi models deletions inconsistent with MMEJ with a second 
neural network module that predicts the total frequency of groups of 
MH.-less deletion outcomes using the minimum required resection 
length as the only input feature (Fig. le). We hypothesize that MH-less 
deletions arise primarily from the classical and alternative NHEJ path- 
ways!” (Supplementary Discussion). 

The MH and MH-less neural networks were jointly trained using 
data from 1,095 lib-A target sites in mESCs with backpropagation 
in a multi-task manner to predict both deletion length frequencies 
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Fig. 2 | Sequence context influences 1-bp insertions. a, 1-bp insertion 
frequencies (mean + 95% confidence intervals) among 1,981 lib-A target 
sites. b, Comparison of 1-bp insertion frequencies among Cas9-edited 
products from 1,996 lib-A target sites. The box denotes the 25th, 50th 

and 75th percentiles, whiskers show 1.5 times the interquartile range, and 
outliers are depicted as diamonds. *P=5.4 x 10-*°, **P=8.6 x 10°”, 
two-sided t-test. c, DNA motif for 1-bp insertion frequency (lib-A, mESCs, 
n= 1,996 target sites). d, Frequencies of 1-bp insertions among 205 target 
sites with varying —5 to —2 nucleotides (relative to the PAM at positions 
0-2) in three low-microhomology contexts. See Extended Data Fig. 5 for 
full axis labels. e, Comparison of the 1-bp insertion frequency at sequences 
in c with varying positions —4 and —3. Box plot as in b. *P=0.03, 

**P = 2.98 x 107’, two-sided t-test. 


and MH genotype frequencies (Fig. le, Supplementary Methods). 
Computational experiments confirmed that the design of the 
neural network modules was important for overall performance 
(Supplementary Methods). From training data, inDelphi learned that 
strong microhomologies tend to be long and have high GC content and 
that the frequency of MH-less deletions decays rapidly with increasing 
length (Extended Data Fig. 2). For 1- to 30-bp deletions, at a typical 
target site in the human genome, inDelphi makes one prediction for 
each of 92 possible MH deletions, and 30 predictions for 274 possible 
MH-less deletion genotypes. 

inDelphi contains a third module that uses k-nearest neighbours 
to predict 1-bp insertions (Fig. le), which represent a major class of 
edited products (9-30% of all edited products, Fig. 1b, Extended Data 
Fig. 2). The frequency of 1-bp insertions and their resultant genotypes 
depend strongly on local sequence context. They are predominantly 
duplications of the —4 nucleotide (counting the NGG PAM as nucle- 
otides 0-2, Fig. le), with higher precision and frequency when the -4 
nucleotide is an A or T (Fig. 2a, b). A linear regression model trained to 
predict the frequency of 1-bp insertions among major editing outcomes 
from local sequence context performed well on held-out lib-A target 
sites (that is, those not included in the training of the model) in mESCs 
(n=499, r=0.63, Fig. 2c) and U20S cells (n = 492, r=0.65, Extended 
Data Fig. 3). In both cell types, target sites with weak microhomology 
(low total phi score) or low deletion precision score (Supplementary 
Methods) were significantly more likely to yield insertions at the expense 
of deletions (P < 2.0 x 1077, Extended Data Fig. 3). Randomization of 
four nucleotides surrounding the Cas9 cleavage site in three constant 
background sequences with weak microhomology revealed substantial 
variation in 1-bp insertion frequency (from <5% to >80% of all edited 
products, Fig. 2d, Extended Data Fig. 3) and identified mini-motifs con- 
sistent with lib-A (Fig. 2e), suggesting that local sequence context is a 
highly influential and causal factor for 1-bp insertion repair. 
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On the basis of these data, inDelphi models insertions and deletions 
as competitive processes in which microhomology strength and pre- 
cision of deletions influence the relative frequency of 1-bp insertions, 
and local sequence context influences the relative frequency and 
genotypic outcomes of 1-bp insertions (Fig. le). inDelphi makes pre- 
dictions within each module in a cell-type-invariant manner, only using 
cell-type-specific data to predict the overall ratio of 1-bp insertions to 
deletions. Collectively across all three modules, inDelphi predicts the 
indel lengths of 80-95% of Cas9-mediated editing products and the 
genotypes of 65-80% of all products (Fig. 3a, Extended Data Fig. 4) 
from sequence context alone. 

inDelphi achieves high accuracy in predicting genotype frequencies 
(median r= 0.94) and indel length frequency distributions (median 
r=0.91) in 189 held-out lib-A target sites in mESCs (Extended Data 
Fig. 4), with similarly high accuracy in U2OS cells (median r=0.88 and 
0.91, Extended Data Fig. 4). On held-out endogenous data, inDelphi 
also performed strongly on the two tasks (median r= 0.87 and 0.84 
across 87-90 target sites in K562, HCT116 and HEK293 cells, Fig. 3b, c). 
Taken together, these results establish that in data from five human 
and mouse cell lines, the relative frequencies of most Cas9-nuclease- 
mediated editing outcomes are highly predictable. 

The ability of Cas9-mediated end-joining repair to induce 
frameshifts enables efficient gene knockout®. We reasoned that 
inDelphi’s accurate prediction of indel lengths when considering 
nearly all editing products would enable accurate prediction of 
Cas9-induced frameshifts. We simulated this task in data from 
82-91 endogenous target sites by tabulating the observed frequency 
of indels resulting in +0, +1 and +2 reading frames. In HEK293 
cells, the observed frequency of indels in each frame predicted by 
inDelphi (median r= 0.81) compare favourably to those generated 
by ‘Microhomology Predictor’, a previously published method’ 
(median r= 0.37, Fig. 3d), with similar results in HCT116 and K562 
cells (Extended Data Fig. 4). Thus, we expect inDelphi to facilitate 
Cas9-mediated gene knockout approaches by allowing a priori selec- 
tion of gRNAs that induce high or low knockout frequencies. We note 
that microhomology deletions in human exons have a significant 
tendency to remain in-frame compared to non-coding human DNA 
(Extended Data Fig. 4). 


Highly precise template-free Cas9 editing 

Although end-joining repair is highly efficient in inducing mutations 
after Cas9 treatment, its propensity to induce a heterogeneous mix- 
ture of repair genotypes has limited applications for precision genome 
editing!’. We used inDelphi to estimate the fraction of SpCas9 gRNAs 
targeting exons and introns in the human genome that support precise 
end-joining repair. Defining ‘precision-x’ gRNAs as those predicted 
to produce a single genotypic outcome in >x% of all major editing 
outcomes proximal to the cleavage site, inDelphi predicts that 28% and 
47% of gRNAs are precision-30, whereas 5% and 11% of gRNAs are 
precision-50, when trained on mESC and U20OS cell data, respectively 
(Fig. 3f, Extended Data Table 1). 

To test the accuracy of the predictions made using inDelphi of 
precise repair in endogenous settings, we selected 14 SpCas9 gRNAs 
predicted to induce precision-40 1-bp insertions. We delivered 
SpCas9 with gRNAs and performed endogenous HTS in human 
U20S and HEK293T cells. We observed that 10 out of 14 predicted 
precision-40 1-bp insertion gRNAs induced a single 1-bp insertion 
genotype in >40% of edited products with an overall significantly 
higher precision (P < 4.2 x 1078) than baseline data in HEK293T 
(median 55% compared with 25% baseline in VO target sites in 
HEK293) and U2OS cells (median 57% compared with 14% base- 
line in lib-A, U2OS, Fig. 3e). We similarly validated 10 gRNAs for 
high-precision deletions with endogenous HTS in both cell types 
(Extended Data Table 2). Collectively, these observations establish the 
ability of inDelphi to identify, from sequence features alone, gRNAs 
that induce significantly more precise editing than the general pop- 
ulation of gRNAs. 
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Fig. 3 | inDelphi accurately predicts nearly all editing outcomes. 

a, Fraction of endogenous editing products given predictions in HEK293 
(n= 86 target sites), HCT116 (n=91) and K562 cells (n= 82). 

b, c, Predictive performance on endogenously observed frequencies of 
genotypes (b) and indel lengths (c) in HEK293 (median, 0.87 and 0.84), 
HCT116 (median, 0.87 and 0.85), and K562 (median, 0.83 and 0.79) cells. 
The box denotes the 25th, 50th and 75th percentiles, and whiskers show 
1.5 times the interquartile range. d, Comparison of predictions from two 
methods to observed frame frequencies (n = 86 target sites, HEK293 
cells), regression estimate + 95% confidence intervals. e, 1-bp insertion 
frequencies among edited outcomes in U2OS and HEK293T cells (n = 27 
and 26 observations, baseline n = 1,958 and 89 target sites, P= 4.2 x 1078 
and 8.1 x 10 |’, respectively), two-sided Welch's t-test. f, Smoothed predicted 
distribution of the highest frequency indel among major editing outcomes 
(+1 to —60 indels) for SpCas9 gRNAs targeting the human genome. 


Template-free correction of pathogenic alleles 

We used inDelphi to identify new targets for therapeutic genome edit- 
ing. Starting with 23,018 pathogenic short indels (ClinVar and HGMD 
databases®!°), we used inDelphi to identify pathogenic alleles that are 
suitable for template-free Cas9-mediated editing to effect precise gain- 
of-function editing of the pathogenic genotype. We pursued two genetic 
disease allele categories that have not been previously identified as targets 
for Cas9-mediated repair: pathogenic frameshifts in which inDelphi pre- 
dicts that 50-90% of Cas9-mediated deletion products will correct the 
reading frame (mean baseline frequency of 34% among disease-associated 
frameshift mutations) and pathogenic microduplication alleles in which 
a short sequence duplication leads to a frameshift mutation or disrupts 
protein function and which inDelphi predicts can be repaired to wild-type 
genotype in a large fraction of Cas9 editing products (Fig. 4a). 

We selected 1,592 pathogenic human loci with high predicted rates 
of frame correction or microduplication correction to the wild-type 
sequence for inclusion in a second library (lib-B). We observed that 
183 human disease microduplication alleles included in lib-B were 
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repaired to wild-type in >50% of all products (Fig. 4b), and 508 path- 
ogenic human frameshift alleles were corrected into proper reading 
frames in >50% of all products in mESCs (Fig. 4c), in agreement with 
inDelphi’s predictions (r=0.64 and 0.64). We observed similar results 
in U2OS cells (r=0.65 for frame correction, r=0.61 for genotype cor- 
rection to wild type, Extended Data Fig. 5). Although repair to the 
wild-type genotype unambiguously restores wild-type protein function, 
we note that frame correction that alters the coding sequence requires 
case-by-case analysis to validate rescue of protein function. 

To determine whether the efficiency of microduplication repair can be 
increased by manipulation of DNA repair pathways, we performed Cas9 
cleavage of lib-B in four NHEJ-deficient conditions”? PrkdeLig4/- 
mESCs”', and mESCs treated separately with DNA-dependent protein 
kinase inhibitor ITI (DPKi3), NU7026 and MLN4924. In NHEJ-impaired 
cells, the fraction of deletion outcomes not involving MH significantly 
decreased (median 23% to 10% with Prkdc~/“Lig4/~, P=1.0 x 10-**, 
and 23% to 19% with DPKi3 and NU7041, P<5.5 x 107°) (Extended 
Data Fig. 6, Supplementary Discussion). In Prkdc’/“Lig4’" mESCs, the 
increased propensity towards MH deletions enabled a subset of patho- 
genic alleles to be repaired to wild type with markedly high precision. 
Compared to wild-type mESCs in which 183 pathogenic alleles cor- 
rected to wild type in >50% of all edited products and 11 pathogenic 
alleles corrected to wild type in >70% of all edited products, in Prkdc’- 
Lig4-/- mESCs, 286 pathogenic alleles corrected to wild type in >50% 
of all edited products and 153 pathogenic alleles corrected to wild type 
in >70% of products (Fig. 4d, Supplementary Table 1) without increase 
in the rate of apoptosis (Extended Data Fig. 6). DPKi3 or NU7041 
treatment also increased precise microduplication repair (Extended 
Data Figs. 5, 6). Taken together, impairing NHEJ can further increase 
the precision of wild-type correction for a large subset of pathogenic 
microduplications in genes such as PKD1 (corrected in 92% of edited 
Prkdc“Lig4/- mESC alleles), MSH2 (88%) and LDLR (87%), supporting 
a model of competing end-joining repair mechanisms. 

We further tested inDelphi’s prediction of highly efficient correction 
in a functional assay with pathogenic LDLR microduplication alleles 
that cause dominantly inherited familial hypercholesterolemia”*. We 
separately introduced five pathogenic LDLR microduplication alleles 
within a full-length LDLR coding sequence upstream of a P2A- 
GFP cassette into the genome of human and mouse cells, such that 
Cas9-mediated repair to the wild-type LDLR sequence should induce 
phenotypic gain of LDL uptake and restore the reading frame of GFP. We 
then delivered Cas9 anda gRNA that is specific to each pathogenic allele 
and does not target the wild-type repaired sequence. We observed robust 
restoration of LDL uptake as well as restoration of GFP fluorescence in 
mESCs, U20S cells and HCT116 cells in up to 79% of cells following 
transfection with Cas9 and inDelphi gRNAs (Fig. 4e, f, Extended Data 
Fig. 7). HTS confirmed efficient correction of these five LDLR micro- 
duplication alleles to wild type in human and mouse cells, as well as path- 
ogenic microduplication alleles in the GAA, GLBI and PORCN genes 
introduced to cells using the same method (Extended Data Table 3). 
Importantly, in these experiments, we observed high-frequency LDLR 
phenotypic correction when cutting with either SpCas9 or Streptococcus 
aureus Cas9 (SaCas9)”> (Extended Data Table 3). 

Finally, we used precise template-free Cas9-mediated MME]J to 
correct pathogenic microduplication alleles endogenously in patient- 
derived fibroblasts for Hermansky—Pudlak syndrome (HPS1 gene), 
which causes blood clotting deficiency and albinism in patients and is 
particularly common in Puerto Ricans”4, and Menkes disease (ATP7A 
gene), which results in copper deficiency. Simultaneous delivery of 
Cas9 and gRNA specific to the pathogenic microduplication allele 
induced high-efficiency correction to the wild-type sequence in HPS1 
(mean frequency, 88% of edited alleles, n =5 independent biological 
experiments) and ATP7A (frequency, 94% of edited alleles, n=2, 
Extended Data Table 2). These findings demonstrate the potential of 
template-free, precise Cas9-nuclease-mediated repair of microdupli- 
cation alleles to achieve efficient repair to the wild-type sequence for 
therapeutic gain-of-function genome editing. 
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Fig. 4 | Precise template-free correction of pathogenic alleles. 

a, Efficient correction of a pathogenic allele to wild type. b, c, Comparison 
among pathogenic alleles of observed and predicted frequencies of repair 
to wild-type genotype (b) and frame (c). d, Wild-type repair frequencies of 
pathogenic alleles with predicted frequency >50% among all major editing 


Discussion 

We used the Cas9-mediated end-joining repair products of thousands 
of target DNA loci integrated into mammalian cells to train a machine 
learning model, inDelphi, that accurately predicts the spectrum of 
cut-site proximal genotypic products resulting from double-stranded 
break repair at a target DNA site of interest. The ability to predict Cas9- 
mediated products enables new precision genome editing research 
applications and facilitates existing applications, such as performing 
efficient bi-allelic gene knockout and predicting end-joining by- 
products of HDR. We provide an online implementation of inDelphi to 
predict the spectrum of Cas9-mediated products along with predicted 
frameshift frequencies and precision at any target site (https://indelphi. 
giffordlab.mit.edu/). 

The inDelphi model identifies target loci in which a substantial frac- 
tion of all repair products consist of a single genotype. Our findings 
suggest that 28-47% of SpCas9 gRNAs that target the human genome 
yield a single indel genotype in >30% of all major repair products 
(precision-30), and 5-11% yield a single indel genotype in >50% of 
all major repair products (precision-50). We show experimentally that 
precision template-free Cas9-mediated editing can mediate efficient 
gain-of-function repair at hundreds of pathogenic alleles including 
microduplications (Fig. 4b, e, f) in cell lines and in patient-derived 
primary cells (Extended Data Table 3). We note that each research or 
therapeutic Cas9-nuclease application may require a different level 
of precision depending on a variety of factors including risk/reward 
calculations of the gene and disease in question. 

Moreover, we present evidence that suppressing NHEJ augments 
repair of pathogenic microduplication alleles, suggesting that tempo- 
rary manipulation of DNA repair pathways could be combined with 
Cas9-mediated editing to favour specific editing genotypes with high 
precision. Genome editing currently lacks flexible strategies to cor- 
rect indels in post-mitotic cells because of the limited efficiency of 
HDR in non-dividing cells'?. As MME] is thought to occur through- 
out the cell cycle*’, inDelphi may provide access to predictable and 
precise post-mitotic genome editing in a wider range of cell states. 
Incorporating the frequencies of long deletions and translocations*®”” 
into predictive models of Cas9 outcomes will be an important next step 
to calculate the overall precision of Cas9-nuclease editing. We antic- 
ipate that, given appropriate training data, inDelphi will also be able 
to accurately predict repair genotypes from other designer nucleases’. 
This work establishes that the prediction and judicious application 
of template-free Cas9 nuclease-mediated genome editing offers new 
capabilities for the study and potential treatment of genetic diseases. 
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METHODS 


Data reporting. No statistical methods were used to predetermine sample size. 
The experiments were not randomized and the investigators were not blinded to 
allocation during experiments and outcome assessment. 

Library cloning. In brief, the cloning process involves ordering a library of oli- 
gonucleotides pairing a gRNA protospacer with its 55-bp target site, centred on 
an NGG PAM. To insert the gRNA hairpin between the gRNA protospacer and 
the target site, the library undergoes an intermediate Gibson Assembly circulari- 
zation step, restriction enzyme linearization and Gibson Assembly into a plasmid 
backbone containing a U6 promoter to facilitate gRNA expression, a hygromycin- 
resistance cassette and flanking Tol2 transposon sites to facilitate integration into 
the genome. 

Specified pools of 2,000 oligonucleotides were synthesized by Twist Bioscience 
and amplified with NEBNext polymerase (New England Biolabs) using primers 
‘oligonucleotide library forward’ and ‘oligonucleotide library reverse’ (see below), 
to extend the sequences with overhangs complementary to the donor template used 
for circular assembly. To avoid overamplification in the library cloning process, we 
first performed qPCR by addition of SybrGreen Dye (Thermo Fisher) to determine 
the number of cycles required to complete the exponential phase of amplification. 
We ran the PCR reaction for half of the determined number of cycles at this stage. 
Extension time for all PCR reactions was extended to 1 min per cycle to prevent 
skewing towards GC-rich sequences. The 246-bp fragment was purified using a 
PCR purification kit (Qiagen). 

Separately, the donor template for circular assembly was amplified with 
NEBNext polymerase for 20 cycles from an SpCas9 sgRNA expression plasmid 
(Addgene 71485)*! using primers ‘circular donor forward’ and ‘circular donor 
reverse’ (see below) to amplify the sgRNA hairpin and terminator, and extended 
further with a linker region meant to separate the gRNA expression cassette from 
the target site in the final library. The 146-bp amplicon was gel-purified (Qiagen) 
from a 2.5% agarose gel. 

The amplified synthetic library and donor templates were ligated by Gibson 
Assembly (New England Biolabs) in a 1:3 molar ratio for 1 h at 50°C, and unligated 
fragments were digested with Plasmid Safe ATP-Dependent DNase (Lucigen) for 1 h 
at 37°C. Assembled circularized sequences were purified using a PCR purification 
kit (Qiagen), linearized by digestion with SspI (New England Biolabs) for >3 h at 
37°C, and the 237-bp product was gel-purified (Qiagen) from a 2.5% agarose gel. 

The linearized fragment was further amplified with NEBNext polymerase using 
primers ‘plasmid insert forward’ and ‘plasmid insert reverse’ (see below) for the 
addition of overhangs complementary to the 5’ and 3’ regions of a Tol2 transposon 
containing gRNA expression plasmid (Addgene 71485)! previously digested with 
BbsI and Xbal (New England Biolabs), to facilitate gXNA expression and integra- 
tion of the library into the genome of mammalian cells. To avoid overamplification, 
we performed qPCR by addition of SybrGreen Dye (Thermo Fisher) to determine 
the number of cycles required to complete the exponential phase of amplification, 
and then ran the PCR reaction for the determined number of cycles. The 375-bp 
amplicon was gel-purified (Qiagen) from a 2.5% agarose gel. 

The 375-bp amplicon and double-digested Tol2 transposon containing gRNA 
expression plasmid were ligated by Gibson Assembly (New England Biolabs) ina 
3:1 ratio for 1 h at 50°C. Assembled plasmids were purified by isopropanol precip- 
itation with GlycoBlue Coprecipitant (Thermo Fisher) and reconstituted in milliQ 
water and transformed into NEB10beta (New England Biolabs) electrocompetent 
cells. Following recovery, a small dilution series was plated to assess transforma- 
tion efficiency and the remainder was grown in liquid culture in DRM medium 
overnight at 37°C. A detailed step-by-step library cloning protocol is provided in 
the Supplementary Methods. 

The plasmid library was isolated by Midiprep plasmid purification (Qiagen). 
Library integrity was verified by restriction digest with SapI (New England Biolabs) 
for 1 h at 37°C, and sequence diversity was validated by HTS as described below. 
Library cloning primers. Oligonucleotide library forward: 5'-TTTT 
TGTTTTCTGTGTTCCGTTGTCCGTGCTGTAACGAAAGGATGGGTGCGAC 
GCGTCAT-3’; oligonucleotide library reverse: 5/-GTTGATAACGGACTAGCC 
TTATTTAAACTTGCTATGCTGTTTCCAGCATAGCTCTTAAAC-3'; circular 
donor forward: 5‘-GTTTAAGAGCTATGCTGGAAACAGC-3’; circu- 
lar donor reverse: 5‘-ATGACGCGTCGCACCCATCCTTTCGTTACAGC 
ACGGACAACGGAACACAGAAAACAAAAAAGCACCGACTC-3’; plasmid 
insert forward: 5‘-GTAACTTGAAAGTATTTCGATTTCTTGGCTTTATATATC 
TTIGTGGAAAGGACGAAACACC-3’; plasmid insert reverse: 5/-TTGTGGTTT 
GTCCAAACTCATCAATGTATCTTATCATGTCTGCTCGAAGCGGCCGTACC 
TCTAGAT TCAGACGTGTGCTCTTCCGATCT-3’. 

Cloning. A base plasmid was constructed starting from a Tol2-transposon- 
containing plasmid (Addgene 71485)! The sequence between Tol2 sites was 
replaced with a CAGGS promoter, multi-cloning site, P2A peptide sequence 
followed by eGFP sequence, and puromycin-resistance cassette to produce p2T- 
CAG-MCS-P2A-GFP-PuroR. The full sequence of this plasmid is appended in 


the ‘Sequences’ section of the Supplementary Methods, and this plasmid has been 
submitted to Addgene as catalogue number 107186. Plasmids with this backbone 
and containing wild-type and microduplication mutation versions of LDLR and 
three other genes, GAA, GLB1, and PORCN, were constructed. Information on 
cloning these genes is provided below, and the gene sequences are appended in 
the Supplementary Methods. 

LDLR: To generate p2T-CAGGS-LDLRwt-P2A-GFP-PuroR, LDLR (NCBI 
gene ID 3949, transcript variant 1 CDS) was PCR-amplified from a base plas- 
mid ordered from the Harvard PlasmID resource core and cloned between the 
BamHI and Nhel sites of the base plasmid. The following mutants were generated 
through InFusion (Clontech) cloning. Sequences are provided below, and our 
internal allele nomenclature is in parentheses: LDLR: c.526_533dupGGCTCGGA 
(LDLRdup252); LDLR: c.668_681dupAGGACAAATCTGAC (LDLRdup254/255); 
LDLR: c.669_680dupGGACAAATCTGA (LDLRdup258); LDLR: c.672_683dup- 
CAAATCTGACGA (LDLRdup261); LDLR: c.1662_1669dupGCTGGTGA 
(LDLRdup264). 

PORCN: NCBI gene ID 64840, transcript variant CCDS was PCR-amplified 
from HCT116 cDNA and cloned between the BamHI and Nhel sites of the base 
plasmid. PORCN: c.1059_1071dupCCTGGCTTTTATC (PORCNdup20) was gen- 
erated through InFusion cloning. 

GLB1: NCBI gene ID 2720, transcript variant 1 CDS was PCR-amplified from 
HCT116 cDNA and cloned between the BamHI and Nhel sites of the base plasmid. 
GLB1: c.1456_1466dupGGTGCATATAT (GLB1dup84) was generated through 
InFusion cloning. 

GAA: NCBI gene ID 2548, transcript variant 1 CDS was PCR-amplified from a 
base plasmid ordered from the Harvard PlasmID resource core and cloned between 
the BamHI and Nhel sites of the base plasmid. GAA: c.2704_2716dupCAGAAG- 
GTGACTG (GAAdup327/328) was generated through InFusion cloning. 

SpCas9!: CDS was amplified from p2T-CAG-SpCas9-BlastR and cloned 
between the BamHI and Nhel sites of the base plasmid by Gibson Assembly. 

SpCas9! and KKH SaCas9”8 were constructed starting from a Tol2-transposon- 
containing plasmid (Addgene 71485)"!. The sequence between Tol2 sites was 
replaced with a CAGGS promoter, Cas9 sequence, and blasticidin-resistance cas- 
sette to produce p2T-CAG-SpCas9-BlastR and p2T-CAG-KKHSaCas9-BlastR. 
These plasmids have been submitted to Addgene as catalogue numbers 107189 
and 107190. 

SpCas9 gRNAs were cloned as a pool into a Tol2-transposon-containing gRNA 
expression plasmid (Addgene 71485)! using BbsI plasmid digest and Gibson 
Assembly (New England Biolabs). SaCas9 gRNAs were cloned into a similar Tol2- 
transposon-containing SaCas9 gRNA expression plasmid (p2T-U6-sgSaCas2 x 
BbsI-HygR), which has been submitted to Addgene using BbsI plasmid digest and 
Gibson Assembly. Protospacer sequences used are listed below, using our internal 
nomenclature which matches the duplication alleles. 

LDLR gRNAs: sgsaLDLRdup252, 5’-GCTGCGAAGATGGCTCGGAGGC-3'; 

sgsaLDLRdup254, 5'/-GTGCAAGGACAAATCTGACAGG-3’; sgsaLDLRdup255, 
5'-GTTCCTCGTCAGATTTGTCCTG-3’; sgsaLDLRdup258, 5’-GACTG 
CAAGGACAAATCTGAGG-3’; sgsaLDLRdup261, 5’-GTTTTCCTCGTC 
AGATTTGTCG-3’; sgspLDLRdup264, 5’-GACATCTACTCGCTGGTGAGC-3’. 
PORCN gRNAs: sgspPORCNdup20, 5’-GCTGTCCCTGGCTTTTATCCC-3’. 
GLB1 gRNAs: sgspGLB1dup84, 5’-GTGTGAACTATGGTGCATATA-3’. GAA 
gRNAs: sgsaGA Adup327, 5’-GCAGCTGCAGAAGGTGACTGCA-3’; sgsp- 
GAAdup328, 5’-GCTGCAGAAGGTGACTGCAGA-3’. 
Cell culture. mESC lines used have been described previously and were cultured 
as described previously””. HEK293T, HCT116, and U20S cells were purchased 
from ATCC and cultured as recommended by ATCC. The following cell lines were 
obtained and cultured as recommended from the NIGMS Human Genetic Cell 
Repository at the Coriell Institute for Medical Research: GM14609 Hermansky- 
Pudlak Syndrome 1 (HPS1) fibroblasts and GM13672 Menkes Syndrome fibro- 
blasts. Cell lines were authenticated by the suppliers and tested negative for 
mycoplasma. 

For stable Tol2 transposon plasmid integration, cells were transfected 
using Lipofectamine 3000 (Thermo Fisher) following standard protocols with 
equimolar amounts of Tol2 transposase plasmid’ (a gift from K. Kawakami) and 
transposon-containing plasmid. For library applications, 15-cm plates with >107 
initial cells were used, and for single gRNA targeting, 6-well plates with >10° initial 
cells were used. To generate lines with stable Tol2-mediated genomic integration, 
selection with the appropriate selection agent at an empirically defined concen- 
tration (blasticidin, hygromycin, or puromycin) was performed starting 24 h 
after transection and continuing for >1 week. In cases where sequential plasmid 
integration was performed such as integrating gRNA/target library and then Cas9 
or microduplication plasmid and then Cas9 plus gRNA, the same Lipofectamine 
3000 transfection protocol with Tol2 transposase plasmid was performed 
each time, and >1 week of appropriate drug selection was performed after each 
transfection. 
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For SpCas9 targeting experiments, cells were transduced with a single lentivirus 
containing an SpCas9 and sgRNA expression cassette to target SpCas9 cleavage to 
either the HPS1: c.1472_1487dup16 or ATP7A: c.6913_6917dupCTTAT microdu- 
plication locus for use in HPS1 and Menkes Syndrome fibroblasts, respectively. The 
lentiviral plasmids were obtained from (LV01, Sigma-Aldrich) and lentivirus was 
produced by the Boston Children’s Hospital Viral Core. Fibroblasts were plated in 
12-well plates at 12,500 cells cm~? one day before transduction. Cells were treated 
with 10-20 1] of virus in the presence of 8 jg ml” Polybrene (Sigma-Aldrich) on 
two consecutive days and collected on day 10 after transduction. 

Apoptosis analysis. Wild-type and Prkdc-/~Lig4~/~ mESCs with stably 
integrated Lib-A were transfected with p2T-CAG-SpCas9-P2A-GFP-PuroR 
using Lipofectamine 3000 following standard protocols in 6-well plates with 
10° cells. After 24 h, cells were stained with Annexin V Alexa Fluor 568 con- 
jugate (Thermo Fisher) according to the manufacturer’s protocols. Fluorescence 
was detected on a Cytoflex LX (Beckman Coulter) and analysed using FlowJo 
(FlowJo LLC). 

Deep sequencing. Genomic DNA was collected from cells after >1 week of selec- 
tion. For library samples, 16 jug gDNA was used for each sample; for individual 
locus samples, 2 jug gDNA was used; for plasmid library verification, 0.5 1g purified 
plasmid DNA was used. 

For individual locus samples, the locus surrounding CRISPR-Cas9 mutation 
was PCR-amplified in two steps using primers >50-bp from the Cas9 target site. 
PCR1 was performed using the primers specified below. PCR2 was performed 
to add full-length Illumina sequencing adapters using the NEBNext Index 
Primer Sets 1 and 2 (New England Biolabs) or internally ordered primers with 
equivalent sequences. All PCRs were performed using NEBNext polymerase. 
Extension time for all PCR reactions was extended to 1 min per cycle to pre- 
vent skewing towards GC-rich sequences. The pooled samples were sequenced 
using NextSeq (Illumina) at the Harvard Medical School Biopolymers Facility, 
the MIT BioMicro Center, or the Broad Institute Sequencing Facility. The library 
prep primers were as follows. For LDLRDup252, 254, 255, 258, 261:120417_ 
LDLRDup254_rlseq_A, 5’-CTTTCCCTACACGACGCTCTTCCGATCT NNN 
ACTCCAGCTGGCGCTGTGAT-3’; 120417_LDLR254_r2seq_A, 5/-GGA 
GTTCAGACGTGTGCTCTTCCGATCTCAACTTCATCGCTCATGTCCTTG-3’. 
For LDLRDup264: 120817_LDLR264_rlseq_B, 5’-CTTTCCCTACACGA 
CGCTCTTCCGATCTNNN AACTCCCGCCAAGATCAAGAAAG-3'; 120817_ 
LDLR264_r2seq_B, 5’-GGAGTTCAGACGTGTGCTCTTCCGATCTCAG 
CCTCTTTTCATCCTCCAAGA-3’, For PORCDup20: 120517_PORCN20_ 
rlseq, 5’-CTTTCCCTACACGACGCTCTTCCGATCTNNNC CTCCTACATGG 
CTTCAGTTTCC-3’; 120517_PORCN20_r2seq, 5’-GGAGTTCAGACGTG 
TGCTCTTCCGATCTCCAGAGCTCCAAAGAGCAAGTTT-3’. For GLB1Dup84: 
120517_GLB184_rlseq, 5’-CTTTCCCTACACGACGCTCTTCCGATCTNNN 
AGCCACTCTGGACCTTCTGGTA-3’; 120517_GLB184_r2seq, 5’-GGAG 
TTICAGACGTGTGCTCTTCCGATCTCCAGTCCGTGAGGATATTGGAAC-3’. 
For GAADup327/328: 120517_GAA327_rlseq, 5/-CTTTCCCTACACGACGC 
TCTTCCGATCTNNNGATCGTGAATGAGCTGGTACGTG-3’; 120517_ 
GAA327_r2seq, 5'/-GGAGTTCAGACGTGTGCTCTTCCGATCTAACAGCGA 
GACACAGATGTCCAG-3’. 

General HTS data analysis and computational modelling. A detailed and thor- 
ough description of methods used for data analysis and computational modelling 
is available in the Supplementary Methods. 

Statistical analysis and reproducibility. Python 2.7 and 3.6 were used to analyse 
data and perform statistical tests using the SciPy library. Data are represented 
as mean + standard error of the mean with 95% confidence intervals. In box plots, 
box segments show median, 25th and 75th percentiles, whiskers above and below 
show 1.5 times the interquartile range. Higher and lower points (outliers) are 
plotted individually or not plotted. Comparison of means of two independent 
groups was performed using two-sided two-sample t-tests, for which the validity 
of the normal assumption was analysed using the Shapiro-Wilk tests for small 
data (n < 50 samples) and/or using the Kolmogorov-Smirnoy test on larger data 
(n > 50) directly, and/or using the Kolmogorov-Smirnov test on bootstrapped 
means (n= 1,000 bootstrapped samples). In all significance tests performed in 
the study, the data satisfied our normality criteria for t-tests. For comparison of 
two independent groups, two-sided two-sample t-tests were used for normally 
distributed data with equal or similar variance (Student’s t-test) or unequal and 
dissimilar variance (Welch's t-test). A critical value for significance of P< 0.05 was 
used throughout the study. 

Here, we report detailed statistical parameters (P value, name of statistical test, 
test statistic value, degrees of freedom (d.f.), effect size) for all significance tests 
performed in the study. 
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Figure 2b, comparison of 1-bp insertion frequencies among Cas9-edited prod- 
ucts from 1,996 lib-A target sites. *P=5.4 x 10°-*°; **P=8.6 x 10-”, two-sided 
two-sample t-test, statistic = —13.0 and — 18.4, d.f.=777 and 1,994; Hedges’s 
g=0.94 and 0.85, for * and **, respectively. 

Figure 2e, comparison of the 1-bp insertion frequency at sequences in 
Fig. 2c with varying positions —4 and —3. Box plot as in Figure 2b. *P=0.03; 
**P — 2.98 x 10~’, two-sided two-sample t-test, statistic= —2.2 and —6.5, 
d.f.=185 and 32, Hedges’s g= 0.58 and 2.3, for * and **, respectively. 

Figure 3e, comparison of 1-bp insertion frequencies among edited outcomes 
in U2OS cells (n= 27 observations, baseline n= 1,958 target sites, P=4.2 x 10-8, 
two-sided Welch's t-test, test statistic = 7.56, d.f.= 27.78, Hedges’s g= 1.47) and 
HEK293T cells (n= 26 observations, baseline n = 89 target sites, P= 8.1 x 107!2, 
two-sided Welch’s t-test, test statistic = 10.40, d.f.= 34.14, Hedges’ g= 2.89). 

Extended Data Fig. 3g, box plots displaying total deletion phi score and 1-bp 
insertion frequencies in mESCs for 312 ‘4-bp’ target sites and 89 VO sequences. 
*P=6.1 x 10~; two-sided two-sample t-test, test statistic = —5.94, d.f.=399, 
Hedges’ g effect size = 0.49. 

Extended Data Fig. 4f, distribution of predicted frameshift frequencies among 
1-60-bp deletions for SpCas9 gRNAs targeting exons (n = 1,000,294 gRNAs; 
mean, 66.4%) and shuffled versions (mean, 69.3%), and introns (n= 740,759) in 
the human genome. Dashed lines indicate means. ***P< 10 °°, two-sided Welch’s 
t-test, test statistic —145.5, d.f. = 1,506,304, Hedges’ g= —0.19. 

Extended Data Fig. 6a, comparison of microhomology deletions among all dele- 
tions at lib-B target sites in wild type (n= 1,909 target sites), DPKi3 (n= 1,999), 
MLN4924 (n= 1,995), NU7026 (n= 1,999), and Prkdc’~Lig4/~ (n= 1,446). 
Statistical tests performed against wild-type population, Welch’s two-sided 
two-sample t-test. *P=5.6 x 10~°, test statistic = 4.0, d.f. = 3,870.8, Hedges’ g 
effect size = —0.13. **P=3.5 x 10-}, test statistic= 7.3, d.f. = 3,890.8, Hedges’ g 
effect size= —0.23. ***P=5.0 x 10-1, test statistic = 13.6, d.f. = 2,651.6, Hedges’ 
g effect size = —0.46. 

Extended Data Fig. 6b, comparison of the frequency of each class of microho- 
mology-less deletions among all deletion products in wild-type (lib-A and lib-B 
target sites, n = 3,829 target sites), DPKi3 (lib-B, n= 1,990), MLN4924 (lib-B, 
n= 1,980), NU7026 (lib-B, n= 1,992) and Prkdc~!~Lig4/— (lib-A and lib-B tar- 
get sites, n = 3,344). P values are compared to wild type, two-sided Welch's t-test. 
Comparing among unilateral top strand joining, wild type versus Prkdc~/~Lig4~/— 
(P=1.1x 10~*|, test statistic = 20.65, d.f. = 6,223.97, Hedges’ g= 0.50), ver- 
sus NU7026 (P= 4.3 x 1078, test statistic = 5.50, d.f. = 2,798.38, Hedges’ 
g=0.18). Comparing among unilateral bottom strand joining, wild type versus 
Prkdc~/~Lig4~/~ (P=4.1 x 10-®, test statistic = 17.65, d.f.= 6,479.88, Hedges’ 
g=0.42), versus NU7026 (P=7.7 x 10~°, test statistic = 4.48, d.f. = 2,868.90, 
Hedges’ g= 0.50). Comparing among medial joining, wild type versus MLN4924 
(P=4.6 x 10-*, test statistic = 10.43, d.f. = 3,240.16, Hedges’ g= 0.31), versus 
DPKi3 (P=4.8 x 10°-~, test statistic =9.72, df. =3,231.41, Hedges’ g= 0.29), ver- 
sus NU7026 (P=4.6 x 10°71, test statistic= 9.49, d.f. = 3,130.82, Hedges’ g=0.29). 

Extended Data Fig. 7f, box plot comparing observed 1-bp insertion frequency 
in lib-A and 12 pathogenic alleles selected by inDelphi in mESCs (combined data 
from n=2 independent biological replicates). The box denotes the 25th, 50th and 
75th percentiles, whiskers show 1.5 times the interquartile range, and outliers are 
depicted d.f.= 11.18, Hedges’ g effect size = 1.47. 

Reporting summary. Further information on research design is available in 
the Nature Research Reporting Summary linked to this paper. 

Code availability. All data processing, analysis and modelling code is available 
at www.github.com/gifford-lab/inDelphi-dataprocessinganalysis. The inDelphi 
model is available online at https://indelphi.giffordlab.mit.edu/. 


Data availability 

High-throughput sequencing data have been deposited in the NCBI Sequence Read 
Archive database under accession codes SRP141261 and SRP141144. Processed 
data have been deposited under the following DOIs: https://doi.org/10.6084/ 
m9.figshare.6838016, https://doi.org/10.6084/m9.figshare.6837959, https://doi. 
org/10.6084/m9.figshare.6837956, https://doi.org/10.6084/m9.figshare.6837953, 
and https://doi.org/10.6084/m9.figshare.6837947. 
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Extended Data Fig. 1 | Design and cloning of a high-throughput library 
to assess CRISPR-Cas9-mediated editing products, yielding diverse 
and replicate-consistent data that is concordant with repair spectra at 
endogenous human genomic loci. a, Empirical distributions of various 
predicted and measured properties of DNA from 169,279 SpCas9 gRNA 
target sites in the human genome. Number of target sites per range used 

to design lib-A are indicated. b, Cumulative percentage of endogenous 
deletions in VO target sites in HEK293 (n = 89 target sites), HCT116 
(n=92) and K562 (n= 86) cells that delete up to the reported number of 
nucleotides (x axis). c, Schematic of the cloning process used to clone lib-A 
and lib-B (Methods, Supplementary Discussion, Supplementary Methods). 
d, Number of unique high-confidence editing outcomes (Supplementary 
Methods) called by simulating data subsampling in data in lib-A 

(n= 2,000 target sites) in mESCs (combined data from n =3 independent 
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biological replicates) and U2OS cells (combined data from n =2 
independent biological replicates). For ‘all, the original non-subsampled 
data are presented. Each box depicts data for 2,000 target sites. Outliers are 
not depicted. e, Pearson's r of genotype frequencies comparing lib-A in 
mESCs and U20S cells with endogenous data in HEK293 (n = 87 target 
sites), HCT116 (n= 88), and K562 (n = 86) cells. Outliers are depicted as 
diamonds. 1-bp insertion frequency adjustment was performed at each 
target site by proportionally scaling them to be equal between two cell 
types. f, Pearson's r of genotype frequencies at lib-A target sites, comparing 
two independent biological replicate experiments in mESCs (n= 1,861 
target sites, median r= 0.89) and U2OS cells (n= 1,921, median r=0.77). 
Outliers are depicted as diamonds. Box plots denote the 25th, 50th and 
75th percentiles and whiskers show 1.5 times the interquartile range. 
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Extended Data Fig. 2 | Categorizing and modelling Cas9-mediated 
DNA repair products with manual data-analysis and automated 
machine learning through inDelphi. a, b, Categories of Cas9-mediated 
genotypic outcomes in data from endogenous contexts at VO target sites 
in K562 (n= 88 target sites), HCT116 (n= 92), HEK293 (n = 89) cells 
(collectively, a) and U2OS cells (b, n = 1,958 lib-A target sites). 

c, Categories and defined properties (Supplementary Methods) of all 
sequence alignments consistent with a Cas9-mediated 7-bp deletion. 


d, Hypothesized mechanisms for template-free DNA repair at Cas9- 
mediated DSBs based on components of the classical NHE], alternative 
NHE]J or MME] pathways (Supplementary Discussion). e, Function 
learned for modelling MH deletions (Supplementary Methods). 

f, Function learned for modelling MH-independent deletions 
(MHless-NN) mapping deletion length to a numeric score (psi, 
Supplementary Methods, point plot) and with deletion length penalty 
normalized to sum to 1 (phi, Supplementary Methods, histogram). 
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Extended Data Fig. 3 | Influential role of hyperlocal sequence context 
features in predicting and causing 1-bp insertions. a, Frequency of 
1-bp insertions in mESCs (n= 1,981 lib-A target sites) and U2OS cells 
(n=1,918) with varying —4 nucleotides. b, c, Plot of 1-bp insertion 
frequency in mESCs (n= 1,996 lib-A target sites) and U20S cells 

(n= 1,966) compared to their total phi score (b) and predicted deletion 
length precision score (c) with Pearson's r. d, Comparison of 1-bp 
insertion frequencies among all edited products from 1,966 lib-A target 
sites in U2OS cells (combined data from n = 2 independent biological 
replicates). e, Nucleotides and their effect on the frequency of 1-bp 
insertions in U2OS cells. Only bases with non-zero linear regression 


weights in 10,000-fold iterative cross-validation are shown. Total n = 1,966 
lib-A target sites. f, Insertion frequency in mESCs (n= 205) and U2OS 
cells (n = 217) when varying four bases by the cleavage site (positions —5 
to —2 counted from the NGG-PAM at positions 0-2) contained within 
three target sites designed with weak microhomology. g, Microhomology 
strength (deletion phi score) and 1-bp insertions in mESCs for 312 “4-bp’ 
target sites and 89 VO sequences. *P=6.1 x 10°; two-sided two-sample 
t-test, test statistic = —5.94, d.f. = 399, Hedges’ g effect size = 0.49. Box 
plots denote the 25th, 50th and 75th percentiles, whiskers show 1.5 times 
the interquartile range, and outliers are depicted as diamonds. 


© 2018 Springer Nature Limited. All rights reserved. 


ARTICLE 


Microhomology 
deletions 
58% 


1-bp insertions 
9%, 


inDelphi prediction resolution 

@ Single-base (genotypic) (67%) 
 Indel length (25%) 

@ Not predicted (8%) 


inDelphi 
r=0.74 


Microhomology-Predictor 
r=0.50 


Probability Mass 


a b 
1.0 - 1.0 
—_— of =a 
0.9 0.9 
® ® 
6 6 
52, 08 2, 08 
ges eg 
3B% 07 63° 07 
Bos 836 
mes 06 Bem 06 
ses ees ull 
cs as 
Set o5 Een 05 
e255 aes 
Ess 04 S35 04 
ofd =o 
ass B53 
Soo 03 £°° 03 
ae ge 
Bas a= 
1} 5 0.2 5 0.2 
= 04 O04 
0.0 0.0 
mESCs U2OS cells mESCs U20S cells 
N= 189 N = 185 = 189 N= 185 
median median median median 
r=0.94 r=0.88 r=0.91 r=0.91 
d e 
inDelphi Microhomology-Predictor 
2 r= 0.76 r=0.26 8 
8.100 Ps , 9 100 
ge og 
Be 80 38 80 
= = 
os 08 
§5 60 62 60 
oe os 
= B40 & BZ 40 
= = 
ou os 
3 2 20 3B 2 20 
ge 4 ge A 
oO 0 oO 0 
a 0 20 40 60 80 20 40 60 80 100 = 0 20 


Endogenously observed 
reading frame frequencies 
among all edited products 

in HCT116 cells (%) 


Endogenously observed 
reading frame frequencies 
among all edited products 

in HCT116 cells (%) 


Extended Data Fig. 4 | inDelphi predictions represent nearly all editing 
outcomes and are accurate at predicting the frequencies of genotypes, 
indel lengths, and frameshift frequencies. a, b, Pearson's r for held- 

out lib-A target sites comparing inDelphi predictions with observed 
frequencies for genotypes (a) and indel lengths (b) in mESCs and U20S 
cells. The box denotes the 25th, 50th and 75th percentiles, whiskers 

show 1.5 times the interquartile range. Densities were smoothed with 
noise but do not extend beyond the data. c, Pie chart depicting the output 
of Delphi for specific outcome classes at lib-A target sites in mESCs. 

d, e, Comparison of two methods for frameshift predictions to observed 
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values with Pearson's r in HCT116 cells (d, n= 91 target sites) and K562 
cells (e, n = 82 target sites). The error band represents the 95% confidence 
intervals around the regression estimate with 1,000-fold bootstrapping. 

f, Distribution of predicted frameshift frequencies among 1-60-bp 
deletions for SpCas9 gRNAs targeting exons (n = 1,000,294 gRNAs; 

mean = 66.4%) and shuffled versions (mean, 69.3%), and introns 

(n= 740,759) in the human genome. Dashed lines indicate means. 

#8 D < 1030 two-sided Welch’s t-test, test statistic= —145.5, 

d.f. = 1,506,304, Hedges’ g= —0.19. 
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Extended Data Fig. 6 | Altered distributions of Cas9-mediated 
genotypic products in Prkdc~/—Lig4—'— mESCs and mESCs treated 
with DPKi3, NU7026, and MLN4924 compared to wild-type mESCs. 

a, Comparison of MH deletions among all deletions at lib-B target sites in 
wild-type cells (n = 1,909 target sites), cells treated with DPKi3 (n = 1,999), 
MLN4924 (n= 1,995) or NU7026 (n = 1,999) and Prkdc~/~Lig4~’~ cells 
(n= 1,446). Statistical tests performed against wild-type population. 
*P=5.6 x 107%, **P=3.5 x 1071, ***P=5.0 x 10741, two-sided 

Welch's t-test. b, Comparison of the frequency of each class of MH-less 
deletions among all deletion products in wild-type (lib-A and lib-B target 
sites, n = 3,829 target sites), DPKi3 (lib-B, n = 1,990), MLN4924 (lib-B, 
n= 1,980), NU7026 (lib-B, n = 1,992) and Prkdc~/~Lig4~/~ (lib-A and 
lib-B target sites, n = 3,344). P values are compared to wild-type, two- 
sided Welch's t-test. c, Frequency of 1-bp insertions at 1,055 target sites 
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in lib-A in Prkdc~/~Lig4~/~ mESCs. d, Frequencies of deletion repair to 
wild-type genotype in lib-B in wild-type mESCs (n = 1,480 target sites, 
combined data from two technical replicates) compared to conditions, 
with combined data from two independent biological replicates for 

each of Prkdc~/~Lig4~/~ (n = 1,041 target sites), MLN4924 (n = 1,569), 
NU/7026 (n= 1,561) and DPKi3 (n = 1,563). e, Table of Pearson's r of the 
change in disease correction frequency compared to wild-type at n=791 
target sites for each pair of conditions. f, g, Annexin V-568 staining flow 
cytometry contour plots (f) and mean + standard deviation values (g) in 
wild-type and Prkdc~/~Lig4~/~ lib-A mESCs following transfection with 
SpCas9-P2A-GFP (representative data for n = 2 experiments). Box plots 
denote the 25th, 50th and 75th percentiles, whiskers show 1.5 times the 
interquartile range, and outliers are depicted as diamonds. For detailed 
statistics on significance tests, see Methods. 
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Extended Data Fig. 7 | See next page for caption. 
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Extended Data Fig. 7 | Template-free Cas9-nuclease editing of human 
and mouse cells containing pathogenic alleles. a, b, Flow cytometric 
contour plots showing GFP fluorescence and LDL-Dylight550 uptake 

in (a) and fluorescence microscopy of (b) HCT116 cells containing the 
denoted LDLR alleles and treated with SaCas9 and gRNA when denoted 
(representative data for n = 2 experiments). c, Fluorescence microscopy of 
U20S cells containing the denoted LDLR alleles and treated with SaCas9 
and gRNA when denoted (representative data for n =2 experiments). 

d, e, Flow cytometry gating strategy used for mESC and LDLRdup-P2A- 
GFP untreated (d) and treated with SpCas9 and gRNA (e). f, g, Results 


ARTICLE 


of 12 pathogenic 1-bp deletion alleles selected by inDelphi for high 1-bp 
insertion frequency (combined data from n = 2 independent biological 
replicates) compared to lib-A (f) and presented in a table (g). The box 
denotes the 25th, 50th and 75th percentiles, whiskers show 1.5 times the 
interquartile range, and outliers are depicted as diamonds. *P= 1.6 x 1074, 
two-sided Welch’s t-test. For detailed statistics, see Methods. In the table, 
the most frequent 1-bp insertion genotype predicted by inDelphi that does 
not correspond to the wild-type genotype is indicated by an asterisk. In 
fluorescence microscopy plots, GFP fluorescence is shown in green, LDL- 
Dylight550 uptake in red, and Hoechst staining nuclei in blue. 
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Extended Data Table 1 | Frequency of gRNAs in the human genome with denoted Cas9-mediated outcome precision 


inDelphi trained on Lib-A data inDelphi trained on Lib-A data 
from mESCs for 1-bp ins. module from U2OS cells for 1-bp ins. module 
Precise product Precise productis Total % of Precise product Precise productis Total % of 

Precision-X is a deletion a 1-bp insertion gRNAs that is a deletion a 1-bp insertion gRNAs that 
threshold (%) (% of gRNAs) (% of gRNAs) are precise-X (% of gRNAs) (% of gRNAs) are precise-X 
10 82 38 93 70 78 97 
15 61 23 75 44 64 87 
20 43 15 55 27 53 72 
25 30 10 39 17 44 58 
30 21 6.6 28 11 36 46 
35 15 44 19 6.9 28 34 
40 10 2.9 13 4.1 21 25 
45 6.5 1.9 8.4 2.4 15 18 
50 4.3 1.3 5.6 1.4 10 12 
55 2.8 0.8 3.6 0.8 6.7 7.5 
60 1.8 0.5 2.3 0.5 4.0 44 
65 11 0.3 1.5 0.2 2.2 2.4 
70 0.7 0.2 0.9 0.1 1.1 1.2 
75 0.4 0.1 0.5 0.04 0.5 0.5 
80 0.2 0.08 0.3 0.01 0.2 0.2 
85 0.08 0.04 0.1 0.003 0.07 0.08 
90 0.03 0.02 0.05 0.0007 0.03 0.03 


SpCas9 gRNAs in human exons and introns in mMESCs (n= 1,003,524 SpCas9 gRNAs) and U20S cells (n= 4,498,780 SpCas9 gRNAs). Predictions were smoothed with Gaussian noise (Supplementary 
Methods). 
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Extended Data Table 2 | Endogenous repair of 24 designed high-precision gRNAs in human cell lines 


Observed frequency among all edited products 
from deep sequencing at endogenous loci (%) 


Most frequent Most frequent 
Gene, exon/chr, Frameshift, genotype, Frameshift, genotype, 
cutsite (hg19) U20S U20S HEK293T HEK293T 
VEGFA . . 
exon: 458 72,72 9,11 81, 71 28, 9 
ents 91, 91 49, 52* 91,91 49, 23* 
exons: 2 
PDCD1 . . 
exon5: 208 90, 90 20, 22 91, 91 29, 13 
APOB 7 . 
exon25: 147 83, 83 22, 21 87, 85 36, 17 
VEGFA a ‘ 
exon3: 127 86, 89 28, 30 92, 91 56, 32 
CCR5 . : 
exon1: 1941 83, 81 20, 21 86, 84 43, 27 
CD274 . : 
exon2: 271 85, 86 9, 10 84, 82 31, 14 
APOB . . 
exon26: 5590 91, 89 30, 27 89 40 
VEGFR2 - . 
exon26: 19 82, 82 35, 33 83, 82 41, 23 
CXCR4 7 . 
exon1: 825 86, 86 32, 33 91 55 
PCSK9 oe sen = = 
exon11: 15 
CCR5 ‘ : 
exon: 885 90, 91 74,71 78 65 
CCR5 i : 
exon: 1027 92, 94 62, 62 91, 92 50, 60 
APOB ' . 
exon26: 5573 93, 93 75, 74 93, 95 69, 82 
ae 94, 92 21, 16 84, 88 19, 28" 
exon1: 61 
CCR5 " F 
exont: 1577 81, 81 29, 30 80, 84 29, 46 
APOB ‘ 
exon22: 100 89, 90 28, 31 90, 89 26, 40 
APOBEC3B ee — a os 
7 exon3: 202 7 a _ 7 7 . - - 7 ; 
MACCHC a 7 
chr1: 45973892 «97 98 81,77 97, 98 79, 86 
PROK2 i ‘s 
chr3: 71821967 92 98 45, 45 92, 93 49, 58 
IDS " e 
chrX: 148564700 96, 95 73, 76 93, 95 63, 79 
ECM1 rn o 
Oy aeuwiask. Bae 47, 52 88, 89 33, 37 
KCNH2 a ” 
chr7: 150644566 6 30 89, 93 71, 75 
peo 91, 92 79, 781+ 90, 96 78, 84tt 


chr19: 11222303 


Data from up to two independent biological replicates are depicted. 
*Deletion. 

tinsertion. 

+Pathogenic 1-bp insertion allele from Clinvar or HGMD. 
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Extended Data Table 3 | Repair of ten pathogenic microduplication alleles in individual cellular experiments 


3 
Pathogenic allele S cs oO o ¥& y 


#AlleleID 245617 245706 245706 245709 245715 246266 25739 354180 354180 98805 ND ND 


Predicted frequency of 
deletions restoring frame (%) 79 98 96 95 86 94 90 76 93 95 ND ND 


Flow cytometric frameshift 
frequency (%) 57 95 57 90 72 87 ND 7g 74 85 ND ND 


Predicted frequency of 

repair to wild-type 

genotype among all major 

editing products (%) 72 90 83 94 85 86 89 74 91 79 88 43 


Flow cytometric 
phenotypic repair 
frequency, mESC (%) 36 69 30 53 33 78 ND ND ND ND ND ND 


Observed frequency of 

repair to wild-type 

genotype among all edited 

products in HTS, MESC (%) ND 67 39 25 15 65 48 76 59 42 ND ND 


Observed frequency of 

repair to wild-type 

genotype among all edited 

products in HTS, U2OS (%) 100 88 ND ND ND 77 ND ND ND ND ND ND 


Observed frequency of 

repair to wild-type 

genotype among all edited 

products in HTS, HCT116 (%) ND ND ND 24 ND 89 ND ND ND ND ND ND 


Observed frequency of 

repair to wild-type 

genotype among all edited 

products in Lib-B, mMESCs (%) ND ND ND ND ND 58 42 ND 63 41 ND ND 


Observed frequency of 

repair to wild-type genotype 

among all edited products in 

primary patient fibroblasts (%) ND ND ND ND ND ND ND ND ND ND 88414* 98 


TGCGA GCAAG TCCTC CTGCA TTTCC ACATC CTGTC AGCTG CTGCA TGTGA CAGCA TTTTT 
AGATG GACAA GTCAG AGGAC TCGTC TACTC CCTGG CAGAA GAAGG ACTAT GGGGA CCATA 
GCTCG ATCTG ATTTG AAATC AGATT GCTGG CTTTT GGTGA TGACT GGTGC GGCCC TAAGA 


QRNA sequence GAGGC ACAGG TCCTG TGAGG TGTCG TGAGC ATCCC CTGCA GCAGA ATATA CCAGC  TAAGA 
KKH KKH KKH KKH KKH KKH 
Cas9 Type SaCas9 SaCas9 SaCas9 SaCas9 SaCas9 SpCas9 SpCas9 SaCas9 SpCas9 SpCas9 SpCas9 SpCas9 


ND, not determined. LDLRdup1, LDLR:c.526_533dupGGCTCGGA. LDLRdup2, LDLR:c.668_681dupAGGACAAATCTGAC. LDLRdup3, LDLR:c.669_680dupGGACAAATCTGA. LDLRdup4, LDL- 
R:c.672_683dupCAAATCTGACGA. LDLRdup5, LDLR:c.1662_1669dupGCTGGTGA. PORCNdup, PORCN:c.1059_1071dupCCTGGCTTTTATC. GAAdup, GAA:c.2704_2716dupCAGAAGGTGACTG. GLB1dup, 
GLB1:c.1456_1466dupGGTGCATATAT. HPS1dup, HPS:c.1472_1487dupCCTCCCCTGCTGGGGG. ATP7Adup: ATP7A:c.6913_6917dupCTTAT. 

*n= 5, 


© 2018 Springer Nature Limited. All rights reserved. 


ARTICLE 


https://doi.org/10.1038/s41586-018-0743-5 


Structural plasticity of D3-D14 ubiquitin 
ligase in strigolactone signalling 


Nitzan Shabek!4, Fabrizio Ticchiarelli*, Haibin Mao!, Thomas R. Hinds!, Ottoline Leyser? & Ning Zheng!?* 


The strigolactones, a class of plant hormones, regulate many aspects of plant physiology. In the inhibition of shoot 
branching, the a/6 hydrolase D14—which metabolizes strigolactone—interacts with the F-box protein D3 to ubiquitinate 
and degrade the transcription repressor D53. Despite the fact that multiple modes of interaction between D14 and 
strigolactone have recently been determined, how the hydrolase functions with D3 to mediate hormone-dependent 
D53 ubiquitination remains unknown. Here we show that D3 has a C-terminal o-helix that can switch between 
two conformational states. The engaged form of this a-helix facilitates the binding of D3 and D14 with a hydrolysed 
strigolactone intermediate, whereas the dislodged form can recognize unmodified D14 in an open conformation and 
inhibits its enzymatic activity. The D3 C-terminal a-helix enables D14 to recruit D53 in a strigolactone-dependent 


manner, which in turn activates the hydrolase. By revealing the structural plasticity of the SC 


F3-D14 ubiquitin ligase, 


our results suggest a mechanism by which the E3 coordinates strigolactone signalling and metabolism. 


Strigolactones represent a class of plant hormones that regulate a variety 
of plant growth and developmental processes, such as shoot branching, 
root development, leaf senescence and flower size!>, Strigolactones 
are also exuded by plant roots for stimulating interactions with sym- 
biotic fungi* and exploited by parasitic plants to time their seed ger- 
mination®*. Asa group of terpenoid lactones, strigolactones typically 
comprise a butenolide ring (D ring) connected to a variable tricyclic 
lactone (the ABC rings) via an enol-ether bridge®"®. Functional dissec- 
tion of both natural and synthetic strigolactone molecules has indicated 
that the C and D rings and their linkage are essential for strigolactone 
activity, whereas separated ABC or D rings are inactive in plants!°"°. 

The perception and signal propagation of strigolactones are coor- 
dinated by three highly conserved components: DWARF3 (D3) in 
rice, or the Arabidopsis thaliana orthologue MAX2 (also known as 
AT2G42620), D14 (AT3G03990) and D53 (LOC4349543) in rice, or the 
Arabidopsis homologues SMXL6 (AT1G07200), SMXL7 (AT2G29970) 
and SMXL8 (AT2G40130)'*"*-?, As a member of the «/@ serine hydro- 
lase superfamily, D14 not only serves as the strigolactone receptor 
but also metabolizes strigolactones into tricyclic ABC- and D-ring 
products, albeit at a rate that is much slower than most known a/8 
hydrolases'*’??. D3 in rice (or MAX2 in Arabidopsis) encodes an F-box 
protein and binds Arabidopsis SKP 1-like protein (ASK1) to function as 
a substrate receptor of an SKP1—-CUL1-F-box (SCF) ubiquitin ligase 
complex?”4. Recent studies have shown that D3 or MAX2, when 
bound to D14, mediates the inhibition of shoot branching by sensing 
strigolactones and ubiquitinating D53 (or the Arabidopsis homologues 
SMXL6, SMXL7 and SMXL8), which is a key nuclear repressor that 
regulates distinct developmental processes and target genes of strigol- 
actone signalling>-17?275-78, 

Early structural studies of strigolactone perception focused on the 
binding of the hormone to isolated D14 orthologues!”'?3”. Crystal 
structures of several D14 orthologues—either in their apo or ligand- 
bound forms—revealed a common «/8 fold with a large, solvent- 
exposed ligand-binding pocket®!7"'*”33°. Thus, strigolactones have 
previously been thought to be perceived by D14 orthologues in this 
open conformation, although possible conformational changes have 


also been suggested. A recent study of the pea (Pisum sativum) D14 
orthologue RAMOSUS3 (RMS3) suggested that the a/$-fold hydrolase 
is a single-turnover enzyme, which produces a covalent D-ring-enzyme 
complex via the catalytic histidine after substrate hydrolysis and the 
rapid release of the ABC ring*!. The crystal structure of rice ASK1—D3 in 
complex with Arabidopsis D14 (AtD14; all uses of D14 without a species 
prefix refer to D14 from Oryza sativa) further uncovered a closed con- 
formation of D3-bound AtD 14, which sequesters the covalently linked 
intermediate molecule (CLIM) of strigolactone inside a small enclosed 
pocket*. These results raised the possibility that CLIM might represent 
the active form of the hormone. However, this proposition is compli- 
cated by the identification of multiple non-hydrolysable strigolactone 
agonists®??-*>, 

To better delineate the signalling-competent form of D14 in the 
context of substrate recognition by the SCF E3 and its relationship 
with hormone hydrolysis, we have performed structure-function 
studies of the homogeneous rice D14-D3-D53 system. Our analy- 
ses have revealed not only structural plasticity in D3 but also func- 
tional states of SCFD?-P that are switchable by D53 for strigolactone 
hydrolysis. 


Structural plasticity of C-terminal a -helix of D3 
We first independently determined the crystal structure of D3 in com- 
plex with ASK1. D3 contains an N-terminal F-box motif that forms a 
canonical interface with ASK1. The C-terminal domain of D3 consists 
of 20 leucine-rich repeats (LRRs) and adopts a fully circularized sole- 
noid fold; the last LRR (LRR20) of D3 makes direct contact with the 
three N-terminal LRRs and the C-terminal «-helix (CTH) of ASK1 
(Fig. 1a). Distinct from most F-box proteins that contain LRRs, the 
extreme C-terminal residue of D3 (Asp720) is strictly conserved among 
diverse plant species (Extended Data Fig. 1a). The backbone and side- 
chain carboxyl groups of this Asp720 are simultaneously anchored to 
a positively charged pocket constructed by ASK1, the F-box motif and 
the LRR domain of D3 (Extended Data Fig. 1b, c). 

We subsequently identified a second crystal form of the D3-ASK1 
complex, which was crystallized in a different space group with two 
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ASK1-D3 4 ASK1 —D3(ACTH) 


ie eee {ASK 1 
—— — D3(ACTH)E 
= J = ] 

Fig. 1 | Structural plasticity of the D3 C-terminal a-helix. a—c, Overall 
structures of ASK1 (green) bound to D3 (blue) with its C-terminal a-helix 
in variable conformation. d, Superposition of ASK1-D3 structures shown 
in a (blue) and ¢ (green), focusing on LRR18-LRR20. LRR20 (orange) is 
disordered in the third crystal form (shown in c). Black arrow indicates 
the conformational shift of LRR19 when LRR20 is disordered. e, Limited 
proteolysis of ASK1 in complex with the D3 protein with (ASK1-D3) or 
without (ASK1-D3(ACTH)) the D3-CTH. The experiment was repeated 
three times. The D3 protein was purified with its N-terminal (NTD) 

and C-terminal (CTD) segments tightly associated with each other 

(see Methods). ‘Deg? indicates the proteolytic product of D3-CTD. 


copies of the complex in the asymmetric unit (Extended Data Table 1). 
A typical LRR consists of a B-strand, an c-helix and an intervening 
loop. In contrast to the first D3-ASK1 structure, the LRR20 a-helix in 
one copy of D3 is mostly reshaped into an extended conformation— 
although its C-terminal Asp720 residue remains engaged with the basic 
binding pocket (Fig. 1b, Extended Data Fig. 1d). The electron den- 
sity of the LRR20 a-helix in the other copy of D3 is entirely missing, 
which indicates that it is dislodged from the LRR domain and becomes 
structurally disordered (Fig. 1c). Concurrent with the remodelling of 
LRR20, the «-helix of the adjacent repeat (LRR19) is shifted away from 
the LRR18 a-helix towards the space that was originally occupied by 
the LRR20 a-helix (Fig. 1d). In a similar but non-isomorphous crystal, 
the electron density of the LRR20 a-helix is absent in both copies of 
the complex (Extended Data Table 1). Together, these structural results 
strongly suggest that the CTH of D3 (D3-CTH) can have a dynamic 
topology that is capable of switching between engaged and dislodged 
states. Using limited proteolytic digestion, we further verified that 
the C-terminal end of D3 is more sensitive to protease cleavage than 
the rest of the protein (Fig. le, Extended Data Figs. 2, 3). Therefore, the 
conformational plasticity of the CTH of D3 is an inherent property of 
the F-box protein in solution. 


D3-CTH binds and inhibits D14 

This unusual structural feature of D3-CTH prompted us to investigate 
its role in the D3-D14 interaction. We first established a quantitative 
method for measuring the GR24-dependent interaction between D3 
and D14 in an AlphaScreen assay (Extended Data Fig. 4a). In a dose- 
dependent manner, a 28-amino-acid peptide of D3-CTH was able to 
compete with D3 for binding D14 at a saturating concentration of GR24 
(Fig. 2a). When fused to glutathione S-transferase (GST), D3-CTH 
robustly pulled down D14 in a GR24-dependent manner (Fig. 2b), 
indicating that it can directly interact with the «/B hydrolyase. In the 
previously reported crystal structure of the AtD14-D3-ASK1 com- 
plex, D3-CTH is fully engaged with the LRR domain™. D3-CTH uses 
its N-terminal tip and preceding loop to assist the recognition of the 
CLIM-bound «/8 hydrolyase by three other D3 C-terminal LRRs 
(LRR17-LRR19) (Extended Data Fig. 4b). Without the rest of the F-box 
protein, it is unlikely that our D3-CTH peptide interacts with CLIM- 
bound D14 in a similar manner. 
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Fig. 2 | D3-CTH binds and inhibits D14. a, AlphaScreen assay measuring 
the ability of D3-CTH peptide to compete with His-D3 for binding the 
GST-D14 complex. Ten-micromolar GR24 was used to ensure constant 
binding between D3 and D14 (mean + s.d. of biological triplicates). 

b, Pull-down assay showing direct GR24-dependent interaction between 
GST-D3-CTH and His-D14 (experiment repeated three times). WB, 
western blot. c, Biphasic kinetics of YLG hydrolysis by D14. The enzyme 
was used at a concentration of 0.125 1M to better separate the initial phase 
from the slow linear phase (purple). An equal amount of extra enzyme was 
added in a second identical sample (blue) to rule out the possibility that 
the slow linear phase is due to substrate depletion. d, e, Kinetics of YLG 
hydrolysis by D14 (0.25 jtM) + ASK1-D3 or + D3-CTH. f, Dose-response 
curve of D3-CTH in inhibiting the enzyme activity of D14. 


D14 hydrolyses the fluorogenic strigolactone agonist Yoshimulactone 
Green (YLG)’ witha biphasic time course that is characterized by a 
rapid initial phase followed by slow linear hydrolysis (Fig. 2c). Such a 
two-stage reaction has been reported for RMS3, which becomes com- 
pletely inhibited by CLIM after hydrolysing a substrate molecule*’. 
Instead of being a single-turnover enzyme, however, D14 slowly 
released the D-ring under our experimental conditions and contin- 
ued to hydrolyse additional substrate, as evidenced by the slow linear 
phase of its enzyme kinetics. Consistent with the recognition and sta- 
bilization of CLIM-bound D14 by D3-ASK1, the addition of recom- 
binant D3-ASK1 to D14 reduced the substrate hydrolysis rate of D14 
in the slow linear phase without compromising the rapid initial reac- 
tion (Fig. 2d). An increasing amount of the isolated D3 C-terminal 
peptide not only blocked the slow linear hydrolysis but also inhibited 
the initial reaction—that is, the first cycle of YLG hydrolysis by D14 
(Fig. 2e). Moreover, the half maximal inhibitory concentration of the 
D3 C-terminal peptide in inhibiting D14 and its affinity to the a/8 
hydrolase are in the same range (Fig. 2a, f). These results suggest that 
the C-terminal region of D3—when dislodged from the LRR domain— 
can interact with and block the enzymatic activity of D14 in a manner 
that is different from the engaged form of this C-terminal region. 


Structure of a D3-CTH-D14-GR24 complex 

To map the binding mode of a D3 C-terminal peptide to D14 bound 
to GR24, we crystallized and determined the structure of D14 that is 
N-terminally fused with a D3 C-terminal sequence in the presence of 
GR24. In the crystal, the D3 C-terminal sequence in one polypeptide 
chain acts in trans and interacts with D14 of a neighbouring molecule, 
which contains an island of electron density near the catalytic triad of 
D14 (Extended Data Fig. 5a). Similar to the isolated form of D14, D14 
bound to D3-CTH adopts an open conformation that is characterized 
by a solvent-accessible substrate-binding pocket (Fig. 3a). The electron 
density inside the pocket matches the overall shape of the GR24 D ring, 
although the position of this density is different from where the GR24 
degradation product—5-hydroxy-3-methylbutenolide (D-OH)—has 
previously been found! (Extended Data Fig. 5b, c). We tentatively 
assigned this density to the D ring of an unhydrolysed hormone for sev- 
eral reasons. First, the crystallized D14 fusion polypeptide shows little 
enzymatic activity in hydrolysing substrate (Extended Data Fig. 5d). 
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Fig. 3 | Structure of the D3-CTH-D14-GR24 complex. a, Top view of 
D14 (purple) bound to D3-CTH (orange), with an open ligand-binding 
pocket. SL, strigolactone. b, Close-up view of the ligand-binding pocket 
of D14 bound to D3-CTH (magenta), superimposed with D14 bound to 
GR24 (grey, RCSB Protein Data Bank code (PDB) 5DJ5). The catalytic 
triad of D14 is shown in sticks. A curved arrow indicates the removal of 
the GR24 D ring away from the catalytic serine residue in the D14 bound 
to D3-CTH. ¢, Interface between D3-CTH (orange) and D14 (magenta) 
with interacting residues shown in sticks. d, A comparison of the 
structural context of D3-CTH (orange) in the ASK1-D3 complex (crystal 
form 1; blue, left) and the D14-D3-CTH complex (magenta, right). 


Second, the electron density is extended beyond the D ring, and points 
towards the exit of the hormone-binding pocket. Third, the location 
of the D ring predicts that the tricyclic ABC rings are largely solvent- 
exposed, which could explain their missing electron density. In com- 
parison to the previously reported GR24-D14 structure!4, the hormone 
is markedly removed from the catalytic centre instead of being poised 
for hydrolysis (Fig. 3b). The relative position of the hormone, and its 
orientation to the active site, suggest that it is bound to the enzyme in 
a non-reactive configuration. 

Upon binding to D14, the D3-CTH sequence adopts the same 
a-helical conformation as seen in its engaged form (Fig. 3a, d). 
D3-CTH docks to a surface site on D14 that is opposite to where D3 
binds in the structure of the AtD14—D3-ASK1 complex (Extended 
Data Fig. 5e). At one end of this interface, Glu700 clamps the D3 a-helix 
to the hydrolase by making polar interactions with Ser224 and His133 
on the D14 aE helix and 36-aT1 loop, respectively. At the other end of 
the interface, D3-CTH inserts Leu707 into a hydrophobic cleft formed 
between the aE helix and (8 strand of D14 (Fig. 3c). As a whole, the 
helical portion of the D3 C-terminal sequence buries a total of 800 A? 
surface area on D14. If the acidic C terminus of D3-CTH were not 
fused to the N terminus of the neighbouring D14 molecule, it might 
be able to interact with a nearby basic D14 surface (Extended Data 
Fig. 5f). 

Superposition analysis of free D14 and D14 bound to D3-CTH 
reveals a slight rotation of the cap domain around the hormone- 
binding pocket, which could couple the docking of D3-CTH to the 
binding of the unhydrolysed hormone (Extended Data Fig. 5g). A closer 
comparison of all D14 structures also reveals a potential allosteric 
pathway that links D3-CTH binding to the D14 catalytic triad 
(Extended Data Fig. 5h). Importantly, D3-CTH uses several common 
residues to either bind D14 in its open conformation or engage with 
the rest of the LRR domain (Fig. 3d). The incompatibility of the two 
structures strongly suggests that D3-CTH binds D14 when dislodged 
from the LRR domain. Overall, the binding mode of the D3 C-terminal 
peptide to D14 reflects a functional state of SCF??-P™ that is different 
from D14 bound to CLIM. 
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Reactivation of D3-bound D14 by D53 

In an in vitro protein degradation system, we next reconstituted 
proteasome-mediated degradation of recombinant D53 with cell-free 
extracts prepared from Arabidopsis Col-0 seedlings (Fig. 4a). Consistent 
with the essential role of the MAX2 in strigolactone signalling, max2-1 
extracts lack D53-degrading activity but can be rescued by the addition 
of recombinant D3 and D14. On the basis of its sequence homology 
with proteins of the class I Clp ATPase family, D53 is predicted to con- 
tain an N-terminal domain and two putative ATPase domains (D1 and 
D2). We purified each of these D53 domains fused to GST and found 
that the D2 domain of D53 (D53-D2) is solely responsible for binding 
D14in a GR24-dependent manner'® (Extended Data Fig. 6a). Both full- 
length D53 and the isolated D2 domain can form a stable complex with 
D14-D3-ASK1 in the presence of GR24 as detected by size-exclusion 
chromatography (Fig. 4b, Extended Data Fig. 6b). Although previous 
studies have suggested that D14 and D3 can individually interact with 
D53!°-!7, we found that the D2 domain of D53 becomes stably associ- 
ated with D14 only in the presence of D3 and GR24. The three bind- 
ing partners, therefore, assemble cooperatively into a ternary complex, 
which explains the degradation of the D2 domain of recombinant D53 
by the proteasome in a MAX2-dependent manner (Fig. 4a). Together, 
these data pinpoint the D2 domain of D53 as the functional module for 
hormone-induced and SCFM4*?-P4._ catalysed turnover. 

We next used the D2 domain of D53 to probe the role of the 
C-terminal region of D3 in D14-mediated substrate binding. Consistent 
with the ability of D3 to flip out its CTH without compromising its 
structural integrity, truncating the 28-amino-acid C-terminal region 
had no detectable effect on the folding and solution behaviour of D3 
(Extended Data Fig. 6c). However, the C-terminally truncated D3 
mutant protein could neither form a ternary complex with D14 and 
D53-D2 on a sizing column, nor restore the D53 degradation activ- 
ity of the max2-1 extracts (Fig. 4a, Extended Data Fig. 4c, d), which 
indicates a critical role of the C-terminal region of D53 in substrate 
recruitment by SCF>?-P, The isolated C-terminal peptide of D3 
was able to stimulate D14 and D53-D2 to pull down one another in 
the presence of GR24 (Fig. 4c, Extended Data Fig. 7a). In the more- 
quantitative AlphaScreen assay, the D3 peptide—but not two shorter 
versions—elicited the same effect in a dose-dependent manner (Extended 
Data Fig. 7b). Mutation of a single D14 residue (S224E) at the interface 
revealed in our D3-CTH-D14 structure compromised D3-CTH-D14 
binding and was sufficient to prevent GST-D53-D2 from pulling down 
D3 (Extended Data Fig. 7c, d). These data strongly suggest that the 
C-terminal region of D3 helps recruit D53 when D3-CTH is liber- 
ated from the LRR domain of D3 and becomes compatible for bind- 
ing D14 with the canonical open conformation. This notion is further 
corroborated by the impaired D53 degradation observed with either 
the isolated D3-CTH peptide or the D14(S224E) mutation in the cell- 
free extracts (Extended Data Fig. 7e, f). D53 was originally identified 
through the gain-of-function rice mutant 453, the gene product of 
which becomes resistant to strigolactone-induced degradation owing 
to the loss of four amino acids in the D2 domain!*!°, Accordingly, the 
D2 domain of the recombinant d53 mutant protein was unable to pull 
down D14 in the presence of D3-CTH and GR24, and remained stable 
in Col-0 cell extracts (Extended Data Fig. 7a, g). These results further 
support the functional relevance of the D3-CTH-mediated recruitment 
of D53 to SCFD?-P 4, 

Given the structural flexibility of D14, we next investigated the effect 
of substrate binding on the hydrolase activity of D14. Similar to GR24, 
YLG can induce complex assembly among D14, D3-ASK1 and the D2 
domain of D53 (Extended Data Fig. 7h). By monitoring YLG hydro- 
lysis, we detected little change in the enzymatic kinetics of D3-ASK1- 
arrested D14 when the D2 domain of D53 was present (Extended Data 
Fig. 7i). By contrast, the addition of D53 robustly blocked the inhibi- 
tion of the enzymatic activity of D14 by the D3 C-terminal peptide, 
both in the rapid initial reaction and the slow linear phase (Fig. 4d). 
Together, these results suggest that the enzymatic activity of D14 within 
the SCF>>-P ubiquitin ligase complex is susceptible to modulation 
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Fig. 4 | Interactions among D3-CTH, D53 and D14. a, Time-dependent 
degradation of GST-D53 and GST-D53-D2 in Arabidopsis seedlings 

of Col-0 and max2-1 mutant extracts, with and without recombinant 
D14 and either D3 or D3(ACTH). MG132, proteasome inhibitor. 

b, Size-exclusion chromatography analysis of the interaction between the 
D2 domain of D53 and D14-GR24 + ASK1-D3, with sodium dodecyl 


by D53 binding, and such a modulatory effect is dependent on the 
conformational state of the CTH of the F-box protein. 


SMXL7 levels are compromised by D3-CTH 

To further validate the role of D3-CTH in vivo, we expressed SMXL7- 
YFP alone or in combination with AtD14, AtD14 fused to MAX2-CTH 
(AtD14-CTH), or MAX2-CTH (CTH) alone in tobacco epidermal cells. 
Despite the cross-species reactions, SMXL7 was markedly destabilized 
upon GR24 treatment (23% reduction, Fig. 5a, e), which indicates that 
the strigolactone perception machinery that is endogenous to tobacco 
epidermal cells is sufficient to induce GR24-dependent degradation of 
SMXL7. In support of the functionality of Af{D14 in tobacco epidermal 
cells, the above response was further accentuated in nuclei that co- 
express AtD14-CTH-mCherry—reaching a nearly 50% reduction 
in the level of SMXL7 by the end of the incubation time (Fig. 5b, e). 
However, this enhancement effect was completely eliminated and 
reversed to 11.8% and 6% reduction when SMXL7-YFP was co- 
expressed with D14-CTH-mCherry or CTH-NLS (nuclear localization 
sequence)—mCherry, respectively (Fig. 5c-e). The MAX2-CTH, there- 
fore, not only prevented AtD14 from accelerating the degradation of 
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Fig. 5 | SMXL7-YFP stability in response to GR24 and D3-CTH 
expression. a—d, Representative images of levels of SMXL7 in response 
to GR24 application in tobacco epidermal cells. Levels of SMXL7-YFP 
(yellow) at 0 and 120 min post-treatment are shown for single nuclei that 
co-express empty vector (a), D14-mCherry (b), D14-CTH-mCherry 

(c) or CTH-NLS-mCherry (d), all of which are displayed in magenta. 
Scale bars, 10 zm. e, Relative SMXL7 abundance at 2 h post-treatment, 
comparing GR24 and mock treatment in cells that either express 
SMXL7-YFP alone or co-express SMXL7 and D14-mCherry, D14-CTH- 
mCherry or CTH-NLS-mCherry, respectively. Yellow dots and bars are 
mean +s.e.m. ***P < 0.001, n=7 (nuclei), two-tailed Student's t-test. 
Black dots, data more than 3 s.d. from mean. Coloured boxes represent 
the central 50% of the distribution, with the median shown as a horizontal 
bar. Top and bottom vertical bars represent 75-100% and 0-25% of the 
distribution of data points, respectively. P values for empty vector, D14, 
D14-CTH and CTH-N7 (N7 is a nuclear localization sequence) are 

1.294 x 10~°, 7.188 x 10-4, 4.401 x 10-7 and 3.86 x 10”, respectively. 
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sulfate-polyacrylamide gel electrophoresis analysis of the elution fractions. 
c, GST pull-down assay using recombinant GST-D14 and non-tagged 
D53-D2 + D3-CTH and/or GR24. d, Kinetics of YLG hydrolysis by D14 

in the presence of D3-CTH at increasing concentrations. All experiments 
repeated three times. 


SMXL7 but also impaired the destabilization of SMXL7 by endogenous 
strigolactone-signalling components, in response to GR24 treatment. 


A model of the functional states of SCFD?-P!4 

Our studies have uncovered a structural plasticity in the D3 F-box 
protein, which can adopt two distinct structural states by altering 
the topology of its CTH. With an engaged CTH, the F-box protein 
is structurally compatible for binding the inactive closed conforma- 
tion of CLIM-bound D14. When the D3-CTH is unleashed from the 
LRR domain, D3 uses the helical structural element to capture hor- 
mone-bound D14 via a different interface. In this binding mode, the 
hydrolase maintains its open conformation, which allows its enzymatic 
activity to be tunable by D53, the substrate of the SCF E3. We postulate 
that the plant SCF?3-P4 complex has evolved these unusual features 
to orchestrate strigolactone sensing, substrate polyubiquitination 
and hormone metabolism in a highly coordinated manner. To explain 
the activity of non-hydrolysable strigolactone agonists®**-*, we 
propose that D14 perceives and transduces the hormonal signal in 
its open conformation, which is recognized by the D3-CTH and is 
competent for D53 binding (Extended Data Fig. 8, Supplementary 
Discussion). Before loading the SCF substrate, D3 arrests strigolac- 
tone-bound D14 to prevent premature hydrolysis of the hormone. 
Hormone-dependent association of D53 with SCFP?-P4 not only 
triggers D53 polyubiquitination but also licenses D14 to catalyse 
strigolactone metabolism, which takes place while D53 is being fully 
modified (or after D53 has been fully modified) by a ubiquitin chain. 
A more-detailed quantitative understanding of the timing of strigo- 
lactone hydrolysis and polyubiquitin chain assembly on a substrate 
awaits further studies. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized and investigators were not blinded to allocation during 
experiments and outcome assessment. 

Protein preparation and purification. The full-length rice D3 (O. sativa) and 
A. thaliana ASK] were co-expressed as a 6 x His-2 x Msb (msyB)* fusion protein 
and an untagged protein, respectively, in Hi5 suspension insect cells. The ASK1-D3 
complex was isolated from the soluble cell lysate by Q Sepharose High Performance 
resin (GE healthcare). NaCl eluates (500 mM) were subjected to Nickel Sepharose 
Fast Flow (GE Healthcare) and were eluted with 250 mM imidazole. To remove 
the 6 x His—-2 x Msb fusion tag, the clarified complex was cleaved at 4°C for 16 h 
by tobacco etch virus (TEV) protease, and was further purified by anion exchange 
and gel-filtration chromatography. For crystallization and biochemical analysis 
purposes, the D3-expressing construct was designed to eliminate a non-conserved 
40-residue disordered loop between amino acid 476 and amino acid 514 after 
affinity purification. The resulting D3 fusion protein contains an 6 x His-2 x Msb 
tag at the N terminus and three TEV protease sites: between the Msb tag and D3, 
after T476 and before L514, yielding a purified split form of D3 with D3 N-terminal 
domain (1-476) and C-terminal domain (514-720) stably associated (Extended 
Data Figs. 2, 3). D3(ACTH) (O. sativa, residues 1-693) was co-expressed with 
ASK1 and purified in the same manner as full-length D3. Purified ASK1-D3 
and ASK1-D3(ACTH) complexes were independently eluted as a single mono- 
dispersive peak off a Superdex-200 gel-filtration column (GE Healthcare) with 
an estimated molecular weight of 93 kDa or 90 kDa, respectively. The D3(AC10) 
construct, which lacks the C-terminal 10 amino acids, was also purified using a 
similar procedure. Rice D14 protein (O. sativa, residues 52-318) was expressed as 
a6 x His-SUMO fusion protein from the expression vector pSUMO (LifeSensors, 
and a gift from E. Xu). BL21 (DE3) cells transformed with the expression 
plasmid were grown in LB broth at 16°C to an OD¢09 of ~1.0 and induced with 
0.1 mM IPTG for 16 h. Cells were collected, re-suspended and lysed in extract 
buffer (20 mM Tris-HCl, pH 8.0, 200 mM NaCl). His-SUMO-D14 was isolated 
from soluble cell lysate by Ni-NTA resin. The eluted His-SUMO-D14 was sub- 
jected to anion exchange and the eluted His-SUMO-D14 was cleaved overnight 
with SUMO protease (Ulp1, LifeSensors) at a protease-to-protein ratio of 1:1,000 
at 4°C. The cleaved His-SUMO tag was removed by passing through a Nickel 
Sepharose column, and the protein was further purified by chromatography 
through a Superdex-200 gel-filtration column in 20 mM Tris, pH 8.0, 200 mM 
NaCl, 2 mM DTT. Full-length D53 (O. sativa) was expressed as a GST fusion 
protein in Hi5 suspension insect cells. D14 (O. sativa, residues 52-318), D3-CTH 
(O. sativa, residues 693-720), D53 N domain (D53-N, residues 1-181), D53 D1 
domain (D53-D1, residues 182-406), D53 D2 domain (D53-D2, residues 718- 
1,131), and the D2 domain of the d53 mutant (F811T followed by deletion of resi- 
dues 812-818, as previously described!*!°) were expressed as GST fusion proteins 
in BL21 (DE3) cells. GST-tagged proteins were isolated by glutathione sepharose 
(GE Healthcare) using a buffer containing 50 mM Tris-HCl, pH 7.5, 200 mM NaCl, 
4% glycerol, 5 mM DTT. Proteins were further purified by either elution with 5-8 
mM glutathione (Fisher BioReagents), or on-column cleavage by TEV, followed by 
anion exchange and size-exclusion chromatography. All proteins were concentrated 
by ultrafiltration to 3-10 mg ml". 

Crystallization, data collection and structure determination. The crystals of 
ASK1-D3 form 1 complex were grown at 4°C by the hanging-drop vapour diffu- 
sion method with 1.0 ul protein-complex sample mixed with an equal volume of 
reservoir solution containing 6.5% CP-42, 175 mM sodium citrate tribasic dihy- 
drate, 87 mM HEPES sodium pH 7.5 and 26% MPD. Crystals of maximal sizes were 
obtained and collected after 2 weeks. The heavy-atom derivative ASK1-D3 form 
1 crystals were prepared by soaking the native crystals in the presence of 10 mM 
K3Pt(NO>),4 for 4h. The crystals of the ASK1-D3 form 2 complex were grown at 
25°C by the hanging-drop vapour diffusion method with 1.0 il protein-complex 
sample mixed with an equal volume of reservoir solution containing 80 mM Tris- 
HCl, pH 7.0, 24% MPD, 24% PEG1000, 24% P3350; 15 mM sodium citrate tribasic 
dihydrate pH 5.6 and 0.5 M 1,6-hexanediol. The crystals of ASK1-D3 form 3 
complex were grown at 25°C by the hanging-drop vapour diffusion method with 
2.0 jl protein-complex sample mixed with an equal volume of reservoir solution 
containing 150 mM Tris-HCl, pH 7.4, 22% MPD, 22% PEG1000, 22% P3350, 
15 mM sodium citrate tribasic dihydrate pH 5.6, 0.45 M 1,6-hexanediol, and 5 mM 
DTT. The crystals of D14-D3-CTH were grown at 4°C by the hanging-drop 
vapour diffusion method with 1.0 jl protein-complex sample mixed with an equal 
volume of reservoir solution containing 0.02 M amino acid mixture (Glu, Ala, Gly, 
Lys and Ser); imidazole; 0.1 M MES monohydrate, pH 6.5, 40% glycerol and 20% 
PEG4000. The single anomalous dispersion dataset was collected near the platinum 
absorption edge (A= 1.072 A). X-ray diffraction data were integrated and scaled 
with HKL2000 package*”. Single anomalous dispersion was used to determine the 
initial phase using PHENIX* with a 2.5 A platinum derivative dataset for ASK1- 
D3 form 1. Initial structural models were built, refined and rebuilt using COOT?® 
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and PHENIX. The final model was built and refined with a native dataset. The 
crystals of ASK1-D3 form 2 and ASK1-D3 form 3 complexes were determined by 
molecular replacement using ASK1-D3 form 1 structure as the search model. The 
D14-D3-CTH structure was determined by molecular replacement using rice D14 
structure (PDB 41H9)” as the search model. All structural models were manually 
built, refined, and rebuilt with PHENIX and COOT. 

AlphaScreen luminescence proximity assay. AlphaScreen assays for determining 
and measuring protein-protein interactions were performed using EnSpire reader 
(PerkinElmer). GST-tagged D53 or D14 was attached to glutathione AlphaScreen 
donor beads. His-tagged D14 or D3 was attached to anti-6 x His conjugated 
AlphaScreen acceptor beads. The donor and acceptor beads were brought into 
proximity by the interactions between D14, D53 and ASK1-D3 complex, which 
were measured with and without GR24 and/or non-tagged proteins at indicated 
concentrations. When excited by a laser beam of 680 nm, the donor beads emit 
singlet oxygen that activates thioxene derivatives in the acceptor beads, which 
then release photons of 520-620 nm as the binding signal. The experiments were 
conducted with 100-500 nM of D14 or D53 and 1 {1M ASK1-D3 complex proteins 
in the presence of 5 jig/ml donor and acceptor beads in a buffer of 50 mM MES, pH 
6.5, 150 mM NaCl, 1 mM DTT and 0.1 mg/ml bovine serum albumin. The results 
were based on an average of three experiments with standard errors typically <10% 
of the measurements. Half maximal inhibitory concentration values were deter- 
mined using nonlinear curve-fitting of graphs generated with Prism 6 (GraphPad). 
YLG hydrolysis assay. YLG (TCI America) hydrolysis assays were performed using 
1-2 wg of recombinant proteins in a reaction buffer (50 mM MES pH 6.5, 150 mM 
NaCl and 1 mM DTT) at a 100-1 volume on a 96-well black plate (Greiner). 
The fluorescence intensity was measured by EnSpire 2300 multilabel plate reader 
(PerkinElmer) at excitation by 480 nm and detection by 520 nm. Ninety-six-well 
black half-area plates were covered with Viewdrop III UV plate seals to prevent 
evaporation. Time-course experiments were performed in 10-s intervals over 
50-60 min. Fluorescence data were converted directly to fluorescein concentra- 
tion using a standard curve. Data generated in Excel were transferred to Prism 6 
for graphical analysis and curve-fitting. In all cases in which synthesized peptides 
(Genscript and Biomatik) were analysed, dimethylsulfoxide (DMSO) was added 
in equivalent concentration into the reaction. 

Size-exclusion chromatography. Purified proteins (20-50 j1M) were incubated 
with 100-200 {uM GR24 (Chiralix), or equal amount of acetone as the solvent 
control, at 4 °C for one hour in 20 mM HEPES, pH 7.0, 150 mM NaCl and 2 mM 
DTT. The reaction was injected onto a Superdex-200 Increase 10/300 column 
(GE Healthcare) for analysis at a flow rate of 0.5 ml min”!. The elution fractions 
(0.5 ml per fraction) were resolved by SDS-PAGE and analysed by Coomassie 
blue G-250 stain. 

Limited proteolytic digestion. One milligram per millilitre of purified ASK1-D3 
(or ASK1-D3(ACTH)) was incubated at 4°C for 12 h with increasing amount of 
trypsin solution containing 0.05 mg ml trypsin (Agilent) at the volume ratio 
of 1:3,000, 1:1,500 and 1:750 in 40 mM Tris-HCl, pH 7.5 and 1 mM DTT. The 
proteolysis reactions were stopped by fivefold-concentrated SDS-PAGE sample 
buffer immediately followed by 5 min boiling at 95°C. Proteins were resolved by 
SDS-PAGE and Coomassie blue G-250 stain. The resolved bends corresponding 
to ASK1 and digested D3-CTD were excised and further analysed by N-terminal 
sequencing (Analytical Core Facility, Tufts Medical School). 

Affinity pull-down assay. Pull-down assay was performed using ~20-40 jig of 
purified GST-tagged proteins as the bait and ~10-25 1g of either His-tagged or 
non-tagged proteins. Reaction mixtures were incubated with GST beads (GE 
Healthcare) at 4°C for 30 min in the reaction buffer with 40 mM Tris-HCl, pH 7, 
100 mM NaCl 2 mM DTT and 0.01% bovine serum albumin. After an extensive 
wash with a buffer containing 40 mM Tris-HCl, pH 7, 250 mM NaCl, 2 mM DTT 
and 0.01% (v/v) Tween 20, the protein complexes on the beads were either eluted 
by fivefold-concentrated SDS-PAGE sample buffer or by 5 mM glutathione. All 
samples were boiled at 95°C for 5 min and resolved by SDS-PAGE. Proteins were 
analysed as indicated by Ponceau stain or western blot analysis using specific poly- 
clonal anti-GST antibody and monoclonal anti-His antibody (Sigma). In pull-down 
assays in the presence or absence of 20-100 |1M GR24, acetone was used as solvent 
control. Inputs samples represent 5-10% of the total reaction. 

D53 stability in a reconstituted cell-free system. Arabidopsis ecotype Columbia-0 
(Col-0) wild-type and max2-1 mutant seeds were surface-sterilized with 50% (v/v) 
bleach and 0.1% Triton X-100. After cold treatment at 4°C for 48 h, seeds were 
germinated and grown on 0.5 x Murashige and Skoog (MS) medium containing 
0.8% agar and 1% sucrose under continuous light at 22°C. One hundred mil- 
ligrams of 7-day-old seedling were collected and frozen in liquid nitrogen. Total 
proteins were extracted after grinding with native protein extraction kit (Minute, 
Invent Biotechnologies) supplemented with protease inhibitor cocktail (Roche), 
followed by two sequential centrifugations at 12,000g for 10 min. To monitor pro- 
tein degradation in the cell-free system, 0.5 j1g of purified GST-tagged proteins 
(either full-length D53, D53-D2 or d53-D2, as indicated) was incubated at 22°C 
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in a reaction mixture that contained, at a final volume of 12.5 ul, 1-2 jl of plant 
extract supplemented with 10 1M GR24, 25 mM Tris-HCl, pH 7.4, 0.625 mM 
ATP, 5mM MgCh and 0.5 mM DTT. Where indicated, the proteasome inhibitor 
MG132 (Calbiochem) was added at a concentration of 100 ,.M. Reactions were 
terminated at the indicated times by the addition of fivefold-concentrated sample 
buffer. Boiled samples were resolved via SDS-PAGE, and proteins were visualized 
using western blot and polyclonal anti-GST antibodies (Sigma). 

Plant growth conditions. Nicotiana benthamiana plants were grown on F2 
compost (Levington Horticulture) pre-treated with 0.2 g L-1 Intercept (Everris). 
Glasshouse conditions as follows: 16 h light:8 h darkness, minimum irradiance 
88 W per m7, shading implemented at 500 W per m? and cooling implemented 
at 31°C. Humidity and temperature were determined by the ambient conditions. 
Cloning and plant transformation. All constructs were cloned using Multi-site 
Gateway (Invitrogen). A previously generated CaMV 35S promoter and SMXL7-— 
YFP expression vector were used”. D14, D14-CTH and CTH-NSL sequences were 
synthesized and cloned into pDONR221. The mCherry and YFP fluorescent tags 
were cloned into the pDONR P2-P3R and all ENTRY vectors were then recom- 
bined in the relevant combinations in pH7m24GW (https://gateway.psb.ugent. 
be/). Agrobacterium tumefaciens strain GV3101 was transformed using standard 
electroporation procedure. 

Assays for transient gene expression mediated by A. tumefaciens in 
N. benthamiana. A. tumefaciens (strain GV3101) carrying the desired transfer 
DNA construct was grown overnight at 28 °C with the appropriate antibiotics. Cells 
were collected by centrifugation at 8,000g and resuspended in agro-infiltration 
medium with 5 mM MES, 10 mM MgCl2, pH 5.6, before syringe infiltration into 
leaves of 3-4-week-old N. benthamiana plants. Bacteria carrying each construct 
were infiltrated at a final OD¢00 nm of 0.4. Leaves were detached 48 h post-infiltration 
for confocal imaging. 

Confocal microscopy. All confocal images were captured on a Leica SP8 laser 
scanning confocal using a W Plan-Apochromat 20x 1.0 numerical aperture 
objective (Zeiss). Detection wavelengths: 520-540 for SMXL-YFP, and 600-620 
for mCherry-tagged proteins. The pinhole was set to one airy unit for all nuclei. 
Detection gain and laser power were kept constant between ft, and fo for intensity 


comparison and the same settings were used for all nuclei of the same construct 
combination. Laser power was adjusted across construct combinations as necessary 
to account for differences in expression level and avoid signal saturation. 
SMXL7 quantification. Two days after infiltration, leaves were infiltrated with 
A. thaliana salt (ATS) with 0.1% v/v acetone for the mock or 10 1M GR24, 
0.1% v/v acetone for the treated samples. Between 7 and 13 nuclei expressing 
35S:SMLX7-YFP alone or in combination with either 35S:D14-mCherry, 35:D14- 
CTH-mCherry or 35:CTH-NLS-mCherry were located and imaged at time 0. 
The same nuclei were then imaged again using identical settings after 120 min 
for each construct combination and treatment. Areas of interest were 
drawn around each nucleus in Image] version 2.0.0 and the mean signal intensity 
was recorded. The ratio of the means at 0 min and 120 min post-treatment was 
used to compute relative fluorescence at tf), which is expressed as the percentage 
of change in the level of SMXL7. Distributions for each combination and treat- 
ment condition were compared using a two-tailed Student's t-test with Bonferroni 
correction. 

Reporting summary. Further information on research design is available in 
the Nature Research Reporting Summary linked to this paper. 


Data availability 

Structural coordinates and structural factors have been deposited in the RCSB 
Protein Data Bank under accession numbers 6BRO, 6BRP, 6BRQ and 6BRT. 
Uncropped gels and blots are available in the Supplementary Information. All 
other data are available from the corresponding author upon reasonable request. 
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ARTICLE 


a LRR20 o-helix 
(D3-CTH) 
—. 


Rice MFT EMRAESWLRFEVQLNSRQIDD 720 
Arabidopsis MST EMRVGSCSRFEDQLNSRN! 1D 693 
Castor MSTEMRVGSCSRFEDALNRRHIVD 695 
Poplar TCT EMRVGSCSRFEDALNRRQILD 694 
Grapevine MSTEMRIDSCSRFEDALNRRRILD 712 
Cucumber MST EMRAGSCSRFEAALNSRQIPD 715 
MonkeyFlower MSTEMRSDSCSRFEAALNRRQI! SD 713 
Tobacco MSTEMRADSLSRFEAALNRRP! SD 724 
Medicago MSTEMRVGSCIRFEDALNRRQICD 711 
Pea MSTEMRVGSCSRFEDALNRRIICD 708 
Soybean MSTEMRVGSCSRFEDALNRRRICD 711 
Maize MNT EMRAESWLRFEVQLNNRLIED 705 
Sorghum MNTEMRAESWLRFENQLNIRLIED 700 
Moss TTTELRSVSCQRFEALVAKRGFPD 698 


D3-LRR20 


Extended Data Fig. 1 | Conservation and conformation of D3 (Asp720) is anchored to a positively charged pocket. c, Close-up view of 
C-terminal o-helix. a, Sequence alignment of the C-terminal regions of the D3 extreme C-terminal residue (Asp720) and its interacting residues in 
14 orthologues of MAX2 or D3. Highly conserved residues are coloured D3 and ASK1. d, Electron densities of the D3-CTH region in two different 
in blue. b, Electrostatic-potential surface map of D3 with CTH shown in crystal forms, adopting either a regular helical conformation (left) or an 
cartoon representation (orange). The C terminus aspartic acid residue extended conformation (right). 
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MAEEEEVEEGRSSSSAILDLPEPLLLHILSFLTDVRSRHRAALACGRMRAAERATRSELSLRGDPRSPGFLFLSH- AFRFPALEHLDLSLVSPWGHPLLSSVPPCGGGGGGA 111 
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MASTTLSDLPDVILST!ISSLVSDSRARNSLSLVSHKFLALERSTRSHLT I RGNA- - RDLSLV- - PD- CFRS!| SHLDLSFLSPWGHTLLA- 
---MTITTTATINDLPDVILSNI IAS! SDTRTRNSLSLVNRKFLTLERTTRTSLTLRGNA- - RDLYMI- - PT- CFRSVTHLDLSLLSPWGHSLLA- 
MAATMNDLPDVILSI1FSSVSDTRTRNSLSLVNRKFLALERSTRTSLTLRGKA- - RDI YMI- - PT- CPFRSVTHLDLSLLSPWGRSDLL- 
MAGAAAGGATT I HDLPDAILSSILASVTDTRARNAAALVCRKWLVLERGTRT SLTLRGNVVHNNLYMI- - PT- CPRAVTHLDLSLLSPWGHSL I - 
MART SINDLPDVLVSNILALVSDTRTRNSLSLVCRKFLSLERATRFSLSLRGNA- - RDLYGI- - PT- CFRSVTHLDLSLLSPWGHAFLC- 
cree ee MAAPPT SGTTLNDLPDVILSNI 1AAVVDVRGRNSAALVCSKWYALERATRSSITLRANL- - RDLFMV- - PN- CFRSVSHLDLSLLSPWGHPLTS 
MATATQLTCST!INDLPDVILSNI 1AAISDVRSCNSAALVSRKWLVLERSTRVSLTLRGNV- - RDLFML- - PT- CFRSITHLDLSL| SPWGHPLLSPVSG- - - - 
coe eeee MVGNNSATTVSHLPEEILSKVFTGITDTRTRNSLSLVCHSFFKLERKTRLSLTLRGNA- - RDLYRI- - PT- SFTNVTHLDVSLLSPWGHALFC- - SP 
--MVD----TTVAHLPEEILSQVFAAITDTRTRNSLSLVCRSFFRLERKTRVSLTLRGNA- - RDLYRI- - PT- SFAHVTNLDVSLLSPWGHALFC- - SP 
MGDGS!IVGHLPEE!ILLNVFSAVSDTRTRNALSLVSWSFYHLERKTRT SLTLRGNA- - RDLHL1I- - PT- SEFKHVTHLDLSFLSPWGHALFC- - SS 


MAEEDAAA- - - AAGSP|ILDLPEPLLLHI LGFLDDARSRHRAALACHRLLAAERATRAALSLRGDPRSNAFLFL- RPT FCFPALERLDLSLV SPWGHPFLSSAAP- - SADAVA 106 
MAED- - - A- - - AAGSPLLDLPEPLLLHILGFLTDARSRHRVALACHRLLAAERAT RAALSLRGDPRSDVFLYL|IRPTFCFPALERLDLSLVSPWGHPFLSSAAP- - SADAVA 104 


- - - MQSSPEAAPAGTHI! SDLPSVILTNI |!AYVSNPRVRNCI SLACRDWYFIERQTRTELSLRGN | - - - - CVMHELPT- CFQQICTLDLSQCSPWGSSLF------------- 91 


PSASSSSGMNV YHPEA! SEQNAF Il AARLAGCFPAVT SLAVYCRDPTTLANLT- PHW- QASLRRVKLVRWHQR- - - 
- SLPIDHQNLLALRLKFCFPFVESLNVYTRSPSSLELLL- PQW- - PRIRHIKLLRWHQR- - - 
- SSLPSDPLLLAHRLGIAFPLVTSLTVYARSPCTLHVL1I- PQW- - PLLSHVKL I RWHQR- - - 
- STASSVPFLLAQRLRLAFPLVTSLTVYARSPSTLHILL- PQW- - PNLSHVKL 1 RWHPR- - - 
- SPSSDPMLLAHLLRHAFPMVT SLTVYART PATLQLLA- PQW- - PNLTHIKLVKWHQR- - - 
SSP--DPDLLAHRLRGLFPLVTSLTVYARTPTTLQILA- RQW- - PELRHVKLVRWHQR- - - 
---ASDPALIAHLLRLAFPSLTSLTFYARNPST IQLVA- SQW- - HNVEHLKLVRWHQR- - - 
AAAGADPSLIAHLLRHAFPSVT SLVVYTRHPFTLRLLP- PLW- - PHLKETKLVRWHQR- - - 
AGN- - DSPLLAQRLRNT FPRVT SLTVYVRDPHT LHLLLFNHW- - PELRDVRLVRWHQR- - - 
ATA- - DSPSLAQRLRNAFPRVT SLT VYVRDPQT LHLLLH SHW- - PELRDVRLVRWHQR- - - 
SATVGHQSLLAQHLRAAFPRVTSLAIYARDPYTLRLLLLSAW- - PELSAVKLVRWHQR- - - 
EEVAEQNAFIAARLAYCFPAVSSLAVYCRDPTTLASLT- PHW- RSGLRSVKLVRWHQR- - - 
EEVAGQNAL | AARLAYCFPAVSSLAVYCRDPTTLASLT- PHW- RSRLRSVKLVRWHQR- - - 


VRALTTHPSATAALTHLDLGLAAATDGFKSSELGP 1 AASCPNLRKLVAPCLFNPRFSDCVGDDALLSLAT SCPRLTVLRLSEPFEAAAN | - - 


EPALAAHPAAAAALT ELDLGLAGATDGFHAAELEA 1 AGSCPSLQKLVAPCVFNPRY IDFVSDDALLT !AARCPKLAI LRLREPFEPAATG- - 
QPALAAHPAAAAALT ELDLGLAGATDGFYAAELGA | ARFCPNLRKLVAPCV FNPRYVDFVSDDALLT IATSCPKLS!ILRLREPFELAATS- - - 


TLRPTLKEVTVLHCRLLHTAECLTALSP1RDRIESLE!NCVWNT TEQPCSVANGTTT- ECDPEDDELG- EV--------------------+---+ YESAAKKCRYMEFDDL-- 516 
LRSKTLTDVRI SCCKNLDTAASLKAIEP!ICDRIKRLH!IDCVWSGSED- EEVEGRVET SEADHEEEDDG 
LLHKTLVEVK1 SACKNLDAVASLRALEP|!RQRIERLH!DCMWNSLQE- EDNYGGN- - HSFDLNE!LFGSD-- -- - - 
LLHKTLIEVK1I SCCKNLNAVASLRSLEP!QGRIERLHFDCVWEGLEE- D- - - GGI- - LCFDLNEGLCQSV--------- EH- - 


LLRRTLIDVKVASCVNLDAAATLRALEP!RDRIERLHLDCVW- - - KESDNLGHSFL- - NFDLNASAEL- NESELMECFGGEEYGE 
LLARTLIDVKVSCCVNLDTAATLRALEP!RER!I ERLHVDCVWNGLKESDGLGHGFL- - NFDLNGLDEPGDGGELMDYFGGGEC- E 
ALRPTIKEVSILNCRLLDTAACLTALSP!RDRIESLE! SCVWKEVEQPESVANGIAG- - CNHEDDDLGGEVS- - - - 
ALRPTIKEVSILHCRLLDTATCLTALSP!RDRIESLEVSCVWKEVEQPESVANGTTG- - CDHEDDDLG- EVT- - - 
NLSHTLKDVE!AGCKLLPTAMT LKALEP!1QVTVKNLHLDCVWDEG |! LAQEA SAARTQSTVDSLNHEQSAR- - - - - 


YESASKKCRYMELDDL- 
---- SMGPGGTQQLSVPGAKKIQTSS- - - 


crt ee ee ee eee GSWEMLRSLSLWFSAGQLLSPLI SAGLDSCPVLEE!I SIKVEGDCRTCPRPAPRT | FGLSDLAGFPVLAKMKLDLSEAVGYALTAPTGQMDLSLWE 611 
--- RVWEKLEYLSLWINVGEFLTPLPMTGLDDCPNLEEIRIK!IEGDCRGKRRPAEPE- FGLSCLALYPKLSKMQLDCGDT IGFALTAPPMQMDLSLWE 586 
GFCMQN- - - - NG- - VWWSNSWDNLKCLSLWIGVGELLTPLPMAGLEDCPSLEE!QI RVEGDCRGRHKLSQRA- FGLSCLAHYPRLSKMQLDCSDT IGFALTAPSGQMDLSLWE 588 
SSCMQS- - - - NGNGMFSKSWDRLKYLSLWI GAGVLLTPLPMAGLYDCPNLEE!IRIKVEGDCRTGHKPSQRE- FGLSCLAYYPRLSKMQLDCSDT1IGFALTAPSGQMDLSLWE 587 
ASYEQN- - - - NGNGI CSKTWERLRCLSLWIGVGELLPPLAKAGLDDCPCLEE!IQIKVEGDCRERSKPSQ- P- FGLSSLMRYPRLSKMKLDCGDT IGYALTAPSGQTDLSTWE 605 
LS- 1QC- - - - NGNDLWGKRWDRLEYLSLWIGVGDFLSPLETVGLDDCPVLQE!QIKVEGDCRRRHKPMD- T- FGLS| LGQYPQLAKMKLDCSDTTGYALTCPSGQMDLTLWE 608 
- SDLNGY- - ENGNVYGERTWERLKYLSLWIAVGQLLNPLTNAGLENCPNLEEIRIK|IEGDCRELPKPSERE- FGLSNLVNYPRLSKMHLDCGET !GYAHTAPSGQMDLSLWE 606 
YEEVNGH- - GNG- - YSGRSWDRLQCLSLWIGVGELLTPLTVAGLEDCPNLEEIKIRVEGDCRLWSKPSERA- FGLSTLLLYPKLSKMHLDCGDT | GYAHTAPSGQMDLSLWE 617 
FVQSNGN- - GNGNGYYGY SWORLEYLSLWIKVGELLTQLPVAGLEDCPNLEEI! RI KVEGDCRGQPKPAVRE- FGLS!I LACYPQLSKMQLDCGDTKGYVYTAPSGQLDLSWWE 604 
FVHSNGN SSGNDNGY SCN SWESLHYLSLWIKVGDLLTQLPAAGLEDCPNLEEIRIKMEGDCRGQPKPAVSE- FGLSI LTCYPQLSKMQLDCGDTRGYVYTAPSGQMDLSLWE 601 
FLOQSNGN------ GFCGK SWOKLQYLSLWIKVGDLLTPLPVAGLEDCPVLEE!RIKVEGDSRGQPKPAESE- FGLS!I LACYPQLLKMQLDCGDTKGYALTAPSGQMDLSLWE 604 
--- + VSWEMLRSLSLWFPAGEVLSPL! SAGLDSCPVLEE!SIKVEGDCRTCARPGP- - LFGLSDLAGFPVLAKMKLDLSEAVGYALTAPAGQMDLSLWE 596 
- -- VSWEMLRSLSLWFPAGEVLSPLI SAGLDSCPVLEE! SIKVEGDCR- - ARPGP- - FFGLRYLAGFPVLAKMKLDLSEAVGYALTAPAGQMDLSLWE 591 
--- GKVWKSLESLSLWI PVGEV | SPLAAMGLEECPALHELKLKVEGDGRLLRKPST- QGWGINSFGRYPKLEKVELDLSEVTGFSLSAPKGFTDLSSWE 591 


TSDVNGFCSED 


RFYLHG!IESLQTLYELDYWPPQDKDVHHRSLTLPAVGL 1 QRCVGLRKLFIHGTTHEHFMT FFLS!I PNLRDMQLREDYYPAPENDLMFT EMRAE SWLRFEVQLNSRQIDD 720 
RFFLTGIGSL- SLSELDYWPPQDRDVNQRSLSLPGAGLLQECLTLRKLFIHGTAHEHFMNFLLRI PNLRDVQLRADYYPAPEND- MSTEMRVGSCSRFEDQLNSRNI 1D 693 
RFFLNGIGSL- SLI ELDYWPPQDRDVNQRSLSLPGAGLLAQCLALRKLFIHGTAHEHFMMFLLRI PNLRDVQLREDYYPAPEND- MSTEMRVGSCSRFEDALNRRHIVD 695 
RFFLNGIGNL- SI YELDYWPPQDRDVNQRSLSLPGAGLLAECLAMRKLFIHGTAHEHFIMFLLRI PNLRDVQLREDYYPAPDND- TCTEMRVGSCSRFEDALNRRQILD 694 
RFYLNGIKNL- TLNELDYWPPQDKDVNHRSLSLPSAGLLAECVTLRKLFIHGTAHEHFMT FLLA!IPNLRDVQLREDYYPAPEND- MSTEMRIDSCSRFEDALNRRRILD 712 
RFFLNGIGSL- GLTELDYWPPQDRSFNQRSLSHPAAGLLAECLTLRKLFIHGTAYEHFMNFLLNIPYLRDVQLRLDYYPAPEND- MSTEMRAGSCSRFEAALNSRQIPD 715 
RFYLIGIGHL- SLRELDYWPPQDRDVNQRSLSLPAAGLLQECFGLRKLFIHGTAHEHFMMFLLRI PDLRDVQLREDYYPAPEND- MSTEMRSDSCSRFEAALNRRQI SD 713 
RFYLFGIGNL- SLTELDYWPPQDRDVNQRCLSLPAAGLLQECVTLRKLFIHGTAHEHFMMFLLRI PNLRDVQLREDYYPAPEND- MSTEMRADSLSRFEAALNRRAPI SD 724 
RFFLNGIGSL- SLNELHYWPPQDEDVNQRSLSLPAAGLLQECYTLRKLFIHGTTHEHFMNYFLK I PNLRDVQLREDYYPAPEND- MSTEMRVGSCIRFEDALNRRQICD 711 
RFFLNGIGSL- SLNELHYWPPQDEDVNQRSLSLPAAGLLQECYTLRKLFIHGTTHEHFMNFFLK!I PNLRDVQLREDYYPAPEND- MSTEMRVGSCSRFEDALNRRI ICD 708 
RFLLNGIGSL- SLGELHYWPPQDEDVNQRSVSLPAAGLLQECYTLRKLFIHGTAHEHFMNFFLK I PNLRDVQLREDYYPAPEND- MSTEMRVGSCSRFEDALNRRARICD 711 
RFYLQG!IDSLMTLYELDYWPPQDKEVNQRSLTLPAVGLLQGCVGLRKLFVHGTTHEHFLT FFLKVPNLRDMQLREDYY PAPE SDMMNT EMRAE SWLRFEVQLNNRLIED 705 
RFYLHGIDSLMTLYELDYWPPQDKEVNQRSLTLPAVGLLQGCVGLRKLFVHGT THEHFLT FFLKVPNLRDMQLREDYYPAPESDMMNT EMRAE SWLRFENQLNIRLIED 700 
RHYLVGINELL- LTELDYWPPSDKEVNRRAI SLPGAGLLSLCSKLRKLFVHGT AHEHFLNM!I TGCRCLRDVQLRGDYYPAPEQE- TTTELRSVSCQRFEALVAKRGFPD 698 


Extended Data Fig. 2 | Sequence alignment and analysis of tobacco (Nicotiana sylvestris) (XP_009757168), medicago (Medicago 
selected orthologues of D3 or MAX2. Orthologues of D3 or truncatula) (XP_003607592), pea (P. sativum) (ABD67495), soybean 
MAX2 are selected and aligned from rice (O. sativa) (accession (Glycine max) (XP_003540983), maize (Zea mays) (XP_020394883), 
XP_015643693), A. thaliana (NP_565979), castor (Ricinus communis) sorghum (Sorghum bicolor) (XP_002436499) and moss (Physcomitrella 
(XP_002528551), poplar (Populus trichocarpa) (XP_002320412), patens) (XP_024400746). The non-conserved region designed to be 
grapevine (Vitis vinifera) (XP_010657042), cucumber (Cucumis sativus) truncated by TEV cleavage during recombinant D3 purification is 
(XP_004137031), monkey flower (Erythranthe guttata) (XP_012832933), underlined in green. 
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PPTLPDGADLEPLLETCA- ALRELDLSEFYCWTEDV 216 
ASQIPTGGDFVP1FEHCGGFLESLDLSNFYHWTEDL 175 
PSSSQLGADFVPLFEQC- KLLSCLDLSSFYYWTEDV 177 
SSSPHLGNDVVPLFEHC- QALSSIDLSSFYYWTED! 173 
SPSA- LGSDFDPILRHC- TSLTSVDLSNFYYWTEDL 179 
PQSAP- GEDLAPIFEHC- RSLSTLDLSEFYYWIED! 171 
PQVETAGDELKILFSECGKNLNSLDLSAFYCWTDDV 177 
PQLAT- GDEFNMLFENCPQ- LKSVDLSTFYCWTDD! 184 
PQGLQPGSDFDALFSRC- RSITSLDLSSFYHWPEDL 179 
PPDLQPGSDFAALFSRC- RSITSLDLSSFYHWPEDL 175 
PPT SANEADFAELFKKC- RSLASLDLSSFYHWTEDI 178 
PPGLDAGADLEPLLEDCP- ALRTLDLSEFYCWTED! 202 
PPGLDAGADLEPLLGDCP- ALRALDLSEFYCWTED! 200 
- - QSTQNGEE!GNCLRIGFPNVVNLTVYVRDALD 1 QMVA- - - WIWPDLE! VKLVRWHPRAMESSEADDLGNE!EGLLSACKR- LKSLDLSKFYCWTED! 184 


-- QREEAAITVAGLVA 320 
PPVLLRYADVAARLTRLDLLTASFTEGYKSSEIVSITKSCPNLKT FRVACT FDPRYFEFVGDETLSAVAT SSPKLTLLHMVDTASLANPRA|IP- - - GTEAGDSAVTAGTLIE 284 
PPVLEAYSDVSKSLTCLDLLTVSLTDGFKSDEIKVITAACTNLTKFLVACMFDPSYLGFTGDETLLAVAANCPKLSVLHLVDTSSLGN!IRSDPEDEGYSGDDARVSVNGLVD 289 
PPVLQAYPSVSKALTCLDLLTVSLTDGFKSEE!QAITAACPSLTRFLLVCI FDPSYFGCVGDETLLAIVANCPRLRVLHLVDRASLGSTRGEPEDDGYTREDARITKVGLVD 285 
PPALQAHPATAAALTRLDLMTLSFAEGFK SHE! LAITAACPNLQQLLIACTFDPRY!IGFVGDEAIVAIASNCPGLTVLHLADTASLSNGRGDPEEEGFSSEDAGI STTALSG 291 
PPVLVANPLTARSI SKLNLMTT SLTDGFKSTDIETITEACPNLSQLLMACT FDPRYFGFVGDETLSAIATNCPRLSLLHLADT STLASVRGDPSADGFTPEDARI STATLIE 283 
PPALESCPSVSSNLTRLNLLNPSFSEGFKSDEIKSITKACPNLKEFRAACMFDPRYMGSVGDEALV SV SVNCPKLSILHLADT SALSNTRGDPEHDGFTQEDAKINVATLIE 289 
PTALESHPMVASNLT SFNLLNSSFPEGFKSDEIKV1ITQACPNLKEFKVACMFDPRY !1GFVGDEGLVCIATNCSKLSVLHLADT SALSNSRGGPNDEGFTEEDAKISVGTLIE 296 
PPVLAENTTTAASLRRLNLLTT SFTEGFKSNQIES!TSSCPNLEHFLVACT FDPRY | GFVGDETLLAVASNCPKLKLLHMADT SSF SNAR- - - EEEG- - VEDARVSRATLVA 286 
PPVLAANAAAAI SLRRLNLLTT- FTEGFKSNQIES!ITSSCPNLEHLLVACT FDPRCIGFVGDETLLAIASNCPKLSLLHMADT SSF SNRR- - - EEEG- - GEDASVSRATLLA 281 
PKVLAANP1! SAATLRRLNLLTT SLPEGFKAHE!IES!TASCPNLEHFLVVCT FHPRY!IGFVSDDTLVAIPSNCPKLSLLHLADTSSFLNRAR- - - EDEGFDGEDASVSRAALLT 287 
---QREDAAITVAGLVS 306 
---QREDAAITVAGLVS 304 
PPALRAGASTAANLRVLNLLKLS- PNGFKAQEVGAITSSCFNLEEFY!ILCDFDHRLLDSVGDEALLS!IATNCPLLKVLHLVDYNEWSAV SDD PNQDAFAAEDSSLSRQGLEA 295 


FFAALPALEDFTMDLQHNVLEAAPAMEALARRCPRIKFLTLGSFQGLCKA- SWLHLDGVAVCGGLESLYMKNCQDLTDASLAAIGRGCRRLAKFGIHGCDLVTSAGIRRLAF 431 
VFSGLPNLEELVLDVGKDVKH SGVALEALNSKCKKLRVLKLGQFQGVCSAT EWRRLDGVALCGGLQSLS1IKNSGDLTDMGLVA!IGRGCCKLTTFE!IQGCENVTVDGLRTMVS 396 
FFSGLPLLEELVLRVCKNVRD SFVALEALNSRCPKLKVLELVQFHGVCMAVE- SQLDGVALCSGLKSLSIKKCADLTDMGL1IE!1ARGCCRLAKFEVEGCKKITMKGLRTMAS 400 
FFTGLPLLQELVLDFYQNVRD SALALEALHSKCPELKLLKLGQFHGICMAIE- SQLDGVALCSGLVSLT IKNSADLTDMGL1E1GRGCCNLARFEVEGCKKITMKGMRTMAS 396 
LFSGLPLLQELVLDVCKNVRD SGAT LEMLNSRCPKLRVLKLGHFHGLCLAIG- SQLDGVALCQGLESLS|IKNSADLTDMGL1IA!IARGCSKLAKFEIHGCKKVTWKGI STMAC 402 
LFSGLPLLEDLVLDVAKNVRD SGPALEVLNTKCRRLRSLKLGQFHGICMAID- SRLDGIALCQGLESLSITNCADLTNMRL | EVGRGCVRLSKFEVKGCKKITVKGLRTMVS 394 
VFSGLPLLEELVLDVCNNVRESGPALEVLNSKCPKLRSLKLCQFHGVSLP1IE- SKLDGIALCHGLESLS|IRNVGDLTDMGL1IAIGRGCCRLSVFEVHGCKNITVRGMRTLAS 400 
VFSGLPLLEELVLDVCNNVRDTGPALE! LNKKCPQLRSLKLGQIHGI! SMP! E- SKLDGVALCQGLQSLS1RNVGDLNDMGL1A!IGRGCSRLAKFE!QGCKKITMRGMRTLAS 407 
LFTGLPLLEELVLDVCKNVT ET SFALEMLSSKCPNLKVVKLGQFQGICLAIG- SRLDGIALCHGLQSLSVNTCGDLDDMGL1IE!IGRGCSRLVRFE!IQGCKLVTEKGLRTMAC 397 
LFSGLPLLEELVLDVCKNVSESSFAFEML SSKCPNLKVVKLGHFQGICLAIG- SRLDG!IALCHGLQSLSV1CCGDLDDMGL1E!IGRGCSRLVRFE!IQGCKLVTEKGLKTMTC 392 
LFSGLPLLEELVLDVCKNVRESSFALEVLGSKCPNLRVLKLGQFQGICLAFG- SRLDGIALCHGLQSLSVGNCADLDDMGL1IE!IARGCSRLVRFELQGCRLVTERGLRTMAC 398 
FFAALPALEDFT LDMRHNVLETAPAMEALARRCPRIKFLTLGGFQGLCKA- SWLHLDGVAVCGSLESLCIKGCLDLTDASLAAIGRGCGRLAKFAIHGCDLVTPAGIRRLAT 417 
FFAALPELEDFTLDLRHNVLETAPAMEALARRCPRIKFLTLGGFQGLCKA- SWLHLDGVAVCGSLESLCIKGCLDLTDASLAA|IGRGCGRLAKFAIHGCDLVTSVGIRRLAT 415 
MFKALPHLEDLVFYLSQNLRDSGAPFEILASSCKKLRSLKLSNFLGVC- - - GGPHPDGIALCHALQELRLKNCGDLTDDALKAI! SVGCPKLSKLGLRQCKSITKEGLHACVK 404 


YERSQKRCKYSFEEEHCS 481 
- YSSRNKRIKYSKD-- - - - 483 
YGSKRKKSKYSSDPD- SS 480 
LRRSTLVEFK!1 SCCKNLDAVSALRGLEP!RDRIQRLHIDC!IWDRSEQFEDSEEA! LAHSFDLNELEQPSI PSQDDDRFWDH- - E- --------- ASIKKKKRKYTTDLD--- 499 
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Extended Data Fig. 3 | Comparison of D3-ASK1 structures. a, Top 
view of ASK1-D3 crystal structure (orange) based on PDB 5HYW. Red 
arrows indicate a gap in the polypeptide model. Note that PDB 5HYW has 
a polypeptide register error ranging from amino acid 373 to 473 before 
the gap. b, Superposition of ASK1-D3 determined in this study (light 
blue) with PDB SHYW. The region truncated by design ranges from N474 


to L516, which are indicated by red arrows. c, Superposition of all three 
crystal forms of ASK1-D3 determined in the current study. d, Limited 
trypsin digestion assay of ASK1-D3 and ASK1-D3(ACTH). Proteins were 
resolved by SDS-PAGE followed by Coomassie blue stain, focusing on D3 
C-terminal domain. The experiment was repeated three times. 
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Extended Data Fig. 4 | Established binding between D3 and D14. biological triplicates). b, The binding interface between CLIM-bound D14 
a, AlphaScreen assay measuring direct interaction between GST-D14 (magenta) and the LRR domain of D3 (blue) (PDB 5HZG). The last four 
and His—D3 in response to increasing amounts of GR24 (mean + s.d. of LRRs are labelled, and D3-CTH in LRR20 is coloured in orange. 


© 2018 Springer Nature Limited. All rights reserved. 


D14 D14 
D3-CTH 


(chain A) (chain B) 


Fluorescein (11M) 


D3-CTH-D14 fusion 


0 20 40 0 
Time (ming 


Extended Data Fig. 5 | Structural analysis of D3-CTH-D14-GR24 
complex. a, Packing of two D14 molecules that are N-terminally fused 
with D3-CTH. The D3-CTH region in chain A is omitted. The GR24 

D ring (sticks) is shown together with the surround 2F, — F, electron- 
density map calculated before the compound modelled in and contoured 
at 0.80. b, A close-up view of the GR24 D ring (sticks, green) and its 
electron density, calculated as in a. c, Overall structure of D14 (magenta) 
bound to D3-CTH (orange), with a GR24 D ring (green sticks). The GR24 
hydrolysis product D-OH (cyan sticks)—revealed in the D14-D-OH 
structure (PDB 3WIO)—is shown on the basis of superposition analysis. 
d, Kinetics of YLG hydrolysis by free D14 and D14 fused to D3-CTH. 
Experiment repeated three times. e, Comparison of the interface that D14 
(magenta and brown) makes upon binding to D3-CTH (orange) versus 
upon binding to ASK1-D3 (blue). The lid domain (brown) of D14 adopts 
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D-OH (PDB:3WIO) 


D3-LRRs 
(PDB: 5HZG) 


open and closed conformation upon binding to D3-CTH and ASK1-D3, 
respectively. f, Electrostatic-potential surface map of D14 bound to 
D3-CTH (orange). The dashed line indicates the C-terminal region of D3 
that would otherwise be free, if D3-CTH were not fused to another copy 
of D14 in the crystal. g, Conformational changes in the lid domain of 
D14, induced by D3-CTH binding, as revealed by superposition analysis 
between D3-CTH-bound (magenta) and apo D14 (grey, PDB 41H9). 
Arrows indicate the rotation of the lid domain of D14, induced by 
D3-CTH (orange), relative to the catalytic triad shown in sticks. 

h, Superposition analysis of apo D14 (PDB 3W04) and D14 bound to 
D3-CTH, which highlights a possible allosteric pathway that connects 
Leu707 of D3-CTH to the catalytic triad of D14. Arrows indicate 
conformational changes within D14 that are induced by binding to 
D3-CTH. 
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Extended Data Fig. 6 | The formation of the D3-D14-D53 complex 
is mediated by the D2 domain of D53. a, Pull-down assay using 
recombinant ASK1-D3, His-D14, and GST-tagged N domain (D53- 
N), D1 domain (D53-D1) or D2 domain of D53. b-d, Size-exclusion 


ASK1-D3 or ASK1-D3(ACTH) (c), and D14-GR24 and D53-D2 with 
ASK1-D3(ACTH) (d). All gels were resolved by SDS-PAGE and analysed 
by western blot using anti-GST and anti-His antibodies (as indicated 
under a) or Coomassie blue staining (b-d). All experiments shown in a-d 


chromatography analyses of the interaction between: full-length 


were repeated independently at least three times. 
GST-D53, D14-GR24 and ASK1-D3 (b), D14-GR24 and either 


© 2018 Springer Nature Limited. All rights reserved. 


BSA 


ARTICLE 


a b 16,000) AlphaScreen Cc GST pull down 
Donor: GST-D53 CNG 
= 14,000] Acceptor: His-D14 p14. WT get 
GST pull down 2 [GR24] = 10 uM GR24(100uM) - + + ¢ 
a rar =] 
> ee Gee 8 12,000] @ D3-CTH (aa 693-720) His D3 — 
coal ea la! low! = m D3 (aa 693-707) WB: o-His | tes 
D53/d53 § 8 
D2 ee = E 10,000, ¥ DS{aa 706-720) GST-D53 i dam ate =? 
Eluates Novpepide WB: o-GST 
bs a = 8,000 His-D3 <i <i cay 
2 . 5 GST-D53 <=> > SR its 
6,000 as 
GST- ; “eo 
D53/d53 we tent ea -10 8 6 ~4 ASK st 
2s = = D3 Peptide Log [M] onceau Stain 
: ed hel d GST Pull Down e Col-0 
D14 iad us Inputs (-) D3-CTH (WT) 


eed) GST-D14 


GST-D53-D2 mmm ss 


—_—e eee 


Time(min) 0 30 60 O 30 60 
o-GST (D3 + D14) 
D3-CTH ii. Mi D3-CTH 
D3-CTH - + +4 Peptide Col-0 
z a + GR24 (100 uM) D3-CTH (WT) D3-CTH (MT) 
+ + GST-D14 GST-D53-D2 ques a 
+ GST-D14 $224E ~ + 
* * 7 + D3-CTH Peptide Time (min) O 60 60 Oo 60 60 
MG132 + + 
a-GST (D3 + D14) 
f max2-1 g GST-D53-D2 GST-d53-D2 
22 eee GST-D53/d53-D2 — eupanuuedD 
-| a oo eee 
amma oe ow = —CGST-D53-D2 = 
WB: o-GST 
0 30 60 0 30 60 Time (min) Time (min)O 30 60 60 0 30 60 60 
o-GST (+D3) MG132 + + 
h I YLG Hydrolysi 
rolysis 
Elution Volumn (ml) ie 0.4 oe 
12 13 14 15 2 
a] D3-NTD € = 
D53-D2 > 120 =. 
e < 
5 ‘= 
pa 8 60 8 02 D14+D3 
' D3-CTD 2 F 2 
ASK1 
10 12 14 16 18 2 04 meee D2 
Elution Volume (ml) , D14+D3+D53-D2 
(60uM) 
0.0 
0 20 40 60 
Time (min) 


Extended Data Fig. 7 | See next page for caption. 
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Extended Data Fig. 7 | D3-CTH facilitates the binding of the D2 domain 
of D53 to D14-D3. a, GST pull-down assay using GST-D53-D2 or the 
GST-tagged D2 domain of the d53 (GST-d53-D2) mutant with non-tagged 
D14, in the presence or absence of D3-CTH as indicated. b, AlphaScreen 
data showing the ability of the D3-CTH peptide (28 amino acids, D3(693- 
720)) to promote the interaction between D53-D2 and D14 in a dose- 
dependent manner; D3(693-707) (15 amino acids) and D3(708-720) (13 
amino acids) peptides did not stimulate binding. DMSO (indicated as ‘no 
peptide’) served as control (data are mean + s.d. of biological triplicates). 

c, GST pull down using recombinant GST-D53-D2 and His-D3-ASK1 

in the presence of recombinant D14 wild type (WT), D14(A223E), 
D14(S224E) and GR24 as indicated. d, GST pull down in the presence 

of the D3-CTH peptide with or without GR24, and in the presence of 
GST-D14 wild type or GST-D14(S224E). BSA was used in the assay to 
prevent non-specific interactions. MG132 was added as indicated. Proteins 
were resolved using SDS-PAGE, and were visualized by Coomassie blue 
staining or western-blot using anti-GST antibodies. The D3-CTH peptide 


contains four amino acid mutations that were designed to disrupt the 
D14-D3-CTH interface: E700R, L707R, D719R and D720R. 

e, f, Degradation of GST-D53-D2 in the Col-0 (e) or max2-1 (f) A. thaliana 
cell-free extract system. GST-D53-D2 was resolved at the indicated time 
in the presence or absence of the wild-type D3-CTH peptide (e, top) or a 
mutant (MT) (e, bottom), and in the presence of D3 and either D14 wild 
type or the D14(S224E) mutant (f). g, Time-dependent degradation of 
GST-D53-D2 and GST-d53-D2 in Arabidopsis seedlings of Col-0 extracts. 
Proteins were resolved by SDS-PAGE, and analysed by western blot 

using anti-GST antibody. MG132 indicates the addition of proteasome 
inhibitor. h, Size-exclusion chromatography analysis of complex formation 
among D53-D2, ASK1-D3 and D14 in the presence of YLG. i, Kinetics of 
YLG hydrolysis by D14 in the presence of ASK1-D3 and D53-D2 at two 
concentrations. Gels were resolved by SDS-PAGE and analysed by western 
blot using anti-GST and anti-His antibodies as indicated under c, e-g. All 
experiments were repeated independently at least three times. 
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Extended Data Fig. 8 | A model for strigolactone perception and 
signalling. A model of the activity cycle that underlies strigolactone- 
induced and SCF?3-P'4_mediated D53 polyubiquitination. D3 adopts 

two conformational states with a structurally variable CTH (left). With a 
dislodged CTH, D3 binds and inhibits D14 in its open conformation, until 
D53 is loaded (top). D53 binding re-activates D14, which can hydrolyse 
strigolactones after or while D53 is polyubiquitinated. The strigolactone 


D53 
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rss | 
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hydrolysis intermediate then stabilizes the closed conformation of D14, 
which converts D3-CTH into its engaged form. The resulting complex 
can ubiquitinate D14 and feed D3 back to the activity cycle (right). 
CLIM-bound D14 might participate in D53 polyubiquitination or in an 
alternative path (bottom). It remains unknown how many strigolactone 
molecules are hydrolysed during the polyubiquitination of each D53 
molecule. 
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Extended Data Table 1 | Data collection and refinement statistics 


ASK1-D3 (form 1) 


ASK1-D3 (form 2) 


ASK1-D3 (form 3) 


D3-CTH-D14-GR24 


Native K Pt(NO?)4 

Data collection 
Space group C2 C2 P2, P2, P6; 
Cell dimensions 

a, b,c (A) 233.7 79.7 153.4 237.4 79.8 151.7 79.4 130.4 94.3 77.9 113.3 92.8 183.8 183.8 153.6 

a, By (°) 90 128.6 90 90 129.7 90 90 99.4 90 90 99 90 90 90 120 
Resolution (A) 50.00 - 2.50 (2.54 - 2.50) 50.00 - 2.50 (2.54-—2.50) 50.00 - 2.40 (2.49-2.40) 50.00-3.00 (3.05-3.00) 50.00-2.40 (2.44-2.40) 
Ryym 0.126 (0.403) 0.128 (0.539) 0.179 (0.939) 0.129(0.658) 0.172(0.699) 
I/ol 35.5 (2.0) 63.7 (2.0) 26.4 (3.0) 39 (1.3) 60.1 (2.5) 
Completeness (%) 98.8 (88.4) 95.9 (71.6) 98.9 (97.8) 96.6 (97.8) 100 (99.8) 
Redundancy 4.7 (3.6) 8.1 (6.3) 12 (5.6) 3.7 (3.5) 11.7 (8.9) 
Refinement 
Resolution (A) 2.50 2.40 3.00 2.40 
No. reflections T3577 73464 31050 115753 
Rwork / Rive (%) 20.0/22.5 19.2/21.8 22.0/25.6 24.8/30.0 
No. atoms 11832 12231 10771 17166 

Protein 11366 11737 10771 16607 

Ligand/ion 0 0 0 8 

Water 466 494 0 aol 
B-factors 

Protein 39.3 35.5 65.6 32.4 

Ligand/ion N/A N/A N/A 38.6 

Water 34.9 31.9 N/A 24.1 
R.m.s. deviations 

Bond lengths (A) 0.010 0.010 0.008 0.011 

Bond angles (°) 1.48 1.40 1.31 1.27 
Ramachandran favored (%) 97,9 98.5 96.5 96.0 
Ramachandran allowed (%) 2.1 1.5 3.5 4.0 
Ramachandran outliers (%) 0 0 0 0 
PDB ID 6BRO 6BRP 6BRQ 6BRT 


This table describes the data collection, phasing and refinement statistics of ASK1-D3 crystals in three forms, as well as crystals of D3-CTH-D14—GR24. Values in parentheses are for the 


highest-resolution shell. 
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Spatially resolved rotation of the broad-line region 
of a quasar at sub-parsec scale 


GRAVITY Collaboration* 


The broadening of atomic emission lines by high-velocity motion 
of gas near accreting supermassive black holes is an observational 
hallmark of quasars'. Observations of broad emission lines could 
potentially constrain the mechanism for transporting gas inwards 
through accretion disks or outwards through winds’. The size 
of regions for which broad emission lines are observed (broad- 
line regions) has been estimated by measuring the delay in light 
travel time between the variable brightness of the accretion 
disk continuum and the emission lines’—a method known as 
reverberation mapping. In some models the emission lines arise 
from a continuous outflow‘, whereas in others they arise from 
orbiting gas clouds®. Directly imaging such regions has not 
hitherto been possible because of their small angular size (less than 
10~4 arcseconds**). Here we report a spatial offset (with a spatial 
resolution of 10~* arcseconds, or about 0.03 parsecs for a distance 
of 550 million parsecs) between the red and blue photo-centres of 
the broad Paschen-o line of the quasar 3C 273 perpendicular to the 
direction of its radio jet. This spatial offset corresponds to a gradient 
in the velocity of the gas and thus implies that the gas is orbiting 
the central supermassive black hole. The data are well fitted by a 
broad-line-region model of a thick disk of gravitationally bound 
material orbiting a black hole of 3 x 10° solar masses. We infer a 
disk radius of 150 light days; a radius of 100-400 light days was 
found previously using reverberation mapping’~’. The rotation axis 
of the disk aligns in inclination and position angle with the radio 
jet. Our results support the methods that are often used to estimate 
the masses of accreting supermassive black holes and to study their 
evolution over cosmic time. 

We observed the quasar 3C 273 at the Very Large Telescope 
Interferometer (VLTT) in Chile using the recently deployed GRAVITY 
instrument” on eight nights between July 2017 and May 2018. The 
instrument coherently combines the light of the four 8-m telescopes to 
form interferometric amplitudes and phases on each of the six baselines 
(telescope pairs). The amplitudes measure the angular extent (size) 
of a structure, whereas the phases provide its on-sky position. The 
continuum dust emission was partially resolved (diameter of about 
0.3 mas) and the broad-line region was more compact. We extracted 
differential phase curves (interferometric phase as a function of wave- 
length, measured relative to the continuum) for each of the six baselines 
near the (cosmologically) redshifted Pac line, which was observed at 
a wavelength of 72.17 jum, and averaged them over time to increase 
the signal-to-noise ratio. The differential phase (A@) measures the shift 
in the photo-centre on the sky (Ax) of the total (line + continuum) 
image along the projected baseline direction for an unresolved source: 


Ag(A) =—2rf,, = Ax(X) (1) 


where B is the sky-projected length of the baseline (telescope sep- 
aration) and fiine is the ratio of the emission line to the total flux in 
each wavelength channel (Methods). In this way, precise phase meas- 
urements from spectro-astrometry!! provide spatial information on 
scales much smaller than the interferometric beam. 3C 273 is the most 


attractive target for spectro-astrometry because it is the brightest 
nearby quasar, with a large region size measured from reverberation 
mapping’ ®, and its strong Pac line is observable in the near-infrared 
K band in which GRAVITY operates. 

The strength of the phase signal depends on the kinematics. 
Turbulent motion produces zero differential phase, whereas a spatial 
velocity gradient results in wavelength-dependent phase shifts. We 
detected such a velocity gradient on three of our six baselines (UT4-3, 
UT4-2 and UT4-1; Methods)—those that were not aligned with 
the direction of the jet of 3C 273. Averaging the data of these three 
baselines, we find phase peaks of 0.25° + 0.06° over multiple spectral 
channels (Fig. 1a). We reject the null hypothesis of zero differential 
phase at a significance of more than 5c (Methods). This precision in 
the differential phase is more than ten times better than that achieved 
previously in interferometry of active galactic nuclei (AGN)””. 

Using the differential phase data from all baselines, we measure the 
two-dimensional position of the photo-centre (the model-independent 
image centroid) in each wavelength channel that contributes sufficient 
line flux that equation (1) can be inverted (fline > 0.35; seven channels). 
We clearly detect a velocity gradient (Fig. 1b, spatial separation between 
red and blue sides of the line), which is nearly perpendicular to the 
large-scale position angle of the radio jet of 3C 273'% (222°). By fitting 
a model with symmetric photo-centres on the red and blue sides of the 
line to all data, we determine a measured offset of Ax = +(—9.5, 6.8) 
+ (1.6, 1.1) pias in right ascension and declination to the blue and red 
sides, which corresponds to a radial shift of about 0.03 pc (6,000 av). 
This precision of about 1 j1as corresponds to about 500 av at a redshift 
of z=0.158 (550 Mpc). 

A velocity gradient perpendicular to the jet is strong evidence of 
‘ordered’ rotation, although the signal-to-noise ratio in each channel 
is too low to uniquely determine the rotation profile. Here we use 
‘ordered’ in the sense of coherent motion that produces a velocity 
gradient. For example, orbital motions with random angular momenta 
would not produce the phase signature that we detect. 

To constrain the size and structure of the line-emitting region, we 
adopt a kinematic model of a collection of non-interacting point parti- 
cles (‘clouds’) with equal (arbitrary) line flux distributed in an annular 
thick disk geometry and assuming circular orbits in the gravitational 
potential of the black hole'*!> (Methods). The seven model parameters 
that we use are the mean and inner radii of the broad Paa line-emitting 
region (BLR) on the sky (Rgrr and Ryin), its radial shape parameter (3, 
ranging from concentrated around the mean to exponentially distrib- 
uted), the opening angle of the disk (4,), the inclination and position 
angles of the observer (i and PA), and the mass of the central black 
hole (Mgy; see Fig. 1c). We find a good joint fit (reduced \? ~ 1.3) 
to the Paa line profile and to the interferometric phases on all base- 
lines and 40 wavelength channels, and constrain the model parameters 
using Markov chain Monte Carlo sampling!® (Table 1, Methods). The 
centroids in each channel found by the model (Fig. 1b, dashed line) are 
in good agreement with the observed centroids. 

We infer a position angle of PA = 210°*$> (with a 180° ambiguity) 
and an inclination of i= 12° 2° (all intervals at 90% confidence) for 


*A list of participants and their affiliations appears at the end of the paper. 


29 NOVEMBER 2018 | VOL 563 | NATURE | 657 


© 2018 Springer Nature Limited. All rights reserved. 


LETTER 


a Radial velocity (km s“) 
-4,000 -2,000 0 2,000 4,000 
08 aaa 
E 41.6 
r Paw flux 4 
Oe 414 
o L J 
2 o4b 412 3 
8 [ 1 
ae J40% 
# 02 y 109 
o Jog E 
5 q 
Pos 4708 £ 
£ 0.0 1 2 
9 [ 0.6 
-o2b Phase 4 7 
L BLR model {°° 
0.4 bobrositiristiiii tii tii itil 
14 215 216 217 2.18 2.19 2.20 
Observed wavelength (11m) 
c 
0/0, 
Angular 
cloud 
density Radial P 
cloud 
density 
Rin Per r 


Fig. 1 | Main observational and modelling results. a, Pac line profile 
(black points; right axis) of 3C 273 observed by GRAVITY, along with the 
differential phase averaged over three baselines (blue points; left axis), 
showing the ‘S’ shape that is typical for a velocity gradient. The error bars 
represent lo. A thick-disk model of the BLR (dashed pink lines, see also c 
and d) provides an excellent joint fit to the data. b, The observed centroid 
position of the photo-centre in several wavelength channels (indicated 

by the colour scale; symbol size is proportional to the signal-to-noise 
ratio) show a clear spatial separation between redshifted and blueshifted 
emission: a velocity gradient at a position angle (PA) nearly perpendicular 
to that of the radio jet (PAje; solid black line). The centroid track of the 


the rotation axis of the BLR. Measurements!”~” of the inclination angle 


of the radio jet from superluminal motion range from 7° to 15°. The 
close two-dimensional alignment of the rotation axis and the radio jet 
confirms that the kinematics are dominated by ordered rotation. The 
half opening angle of the gas distribution is 45°"2.. 

The mean radius of the Paa emitting region is found to 
be Rgitrp = 46 + 10 jras (0.12 + 0.03 pc), with an inner edge at 
Rin = 11 +3 ptas (0.03 £0.01 pc) and a roughly exponential radial 
emission profile, with shape parameter G= 1.4 +0.2 (Methods). The 
measured mean radius corresponds to 145 + 35 light days, roughly half 
the values obtained from previous reverberation mapping estimates 
(260-380 light days) using HB and Hy emission lines”®, but consistent 
with the lower limit of 100 light days found from a subsequent re- 
analysis’. The discrepancy is probably due to the difficulty in measuring 
long lags in the brightest quasars, and could be partially due to intrinsic 
source variability (Methods). 

Our inferred BLR radius is also a factor of roughly three smaller 
than the continuum dust radius found from previous interferometry 
measurements” and from our own data. This has been found to be the 
case for many AGN?!. The size is also much smaller than that found 
tentatively from previous spectro-interferometry of 3C 273 using the 
VLTI instrument AMBER” (Rgrp 200 j1as). The object was too 
faint for fringe tracking in those observations. In addition, one of the 
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BLR model is shown as a dashed line. c, Schematic representation of the 
model parameters. The green shaded area shows the geometry of the gas 
that surrounds the supermassive black hole (of mass Mgy), with the blue 
circle indicating an individual gas cloud. The angular (6’; normalized by 
the opening angle of the disk 0,) and radial (r) distributions of the gas 
clouds are plotted on the left and below, respectively. The rotation axis of 
the disk points along z’, which is inclined by an angle i to the line of sight z. 
d, Velocity map (colour scale) of the model that best fits the discrete clouds 
(points)—a thick disk geometry viewed nearly face-on. Disorder in the 
velocity map reduces the observed shifts in the photo-centre (b) compared 
to the angular size of the BLR (d). 


three baselines used by AMBER was in the direction of the jet, that is, 
perpendicular to the disk. Our GRAVITY data, which have higher 
precision by a factor of about 40, rule out such a large size (Extended 
Data Fig. 4). 

The inferred inner edge of the Paw emission region is a result of 
the cut-off in the line profile at +4,000 km s~!, which probably corre- 
sponds to the location where Balmer and Paschen emission becomes 
weak compared with that of higher-ionization lines”. The best-fitting 
structure is similar to that found from velocity-resolved reverberation 
mapping of nearby Seyfert 1 galaxies“, which suggests that the prop- 
erties of BLRs may not vary strongly with the luminosity or Eddington 
ratio of AGN. 

We infer the black-hole mass of 3C 273 directly from the model to 
be Mgy = (2.6 + 1.1) x 10°M., where Mo is the solar mass. In rever- 
beration mapping experiments, Mgy is obtained by combining Balmer- 
line time-delay measurements with the gas velocity obtained from the 
line profile. This requires the use of a velocity-inclination factor 
f=GMpu/ (v*Rpip), where G is the gravitational constant and v is the 
gas velocity, which is usually defined as either the second moment of 
the line profile on. or the full-width at half-maximum of the line 
FWHMhine- Empirical mean values of f ((f)) are obtained by matching 
the mass estimates from reverberation mapping with those from the 
relationship between the black-hole mass Mgy and the stellar velocity 
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Table 1 | Estimates of the kinematic BLR model parameters 


Parameter Value Description 

Rar (as) 46+10 Mean angular distance of the cloud 
from the black hole 

Rmin (as) Il43 Minimum angular distance of the 
cloud from the black hole 

B 1440.2 Radial distribution shape parameter 

M (°) 45%? Half-opening angle of the disk 

iC) 1242 Inclination angle of the observer 

PA (°° Eof N) 210°§ Position angle on the sky 

Mex (108M 5) 2.6+1.1 Black-hole mass 


Values are medians and uncertainties represent 90% confidence ranges. 


dispersion of galaxy bulges o”°”®. Typically, (f) = 4.3 for v=O|ine and 
(f) =1.1 for v=FWHMhine. For the observed 3C 273 Paa line, we meas- 
Ure Cline = 1,400 km s~! (f= 4.7 + 1.4) and FWHM\ine = 2,700 km s7! 
(f=1.3+0.2), both in good agreement with the mean values used for 
HG in the literature?’. 

Because they used a larger Rpr (260-380 light days)”*”’ for 3C 273, 
previous studies found a mass of roughly double what we found. The 
mass is correlated with inclination: higher mass requires lower inclina- 
tion to match the observed line width. Our Rgrp and Mpy could also in 
principle be underestimated: a larger BLR with added disorder may also 
be able to explain the phase signatures that we observe. Our highest- 
velocity photo-centre on the blue side (which has the lowest signal-to- 
noise ratio and is therefore not shown in Fig. 1) is offset (by 1o-20) to 
the south and lies along the jet (Methods, Extended Data Fig. 2). This 
could be a hint ofa second kinematic component contributing line flux. 

Our results support the fundamental assumptions of reverberation 
mapping. The broad line width is dominated by bound motion in the 
gravitational potential of the black hole, and our inferred values of f 
match what is commonly found from the mean population. The size of 
the region is about half what is typically assumed, but within the large 
range of results reported for 3C 273. Quantitative size comparisons 
should be done using Seyfert galaxies with shorter lags, for which the 
radius is better determined from reverberation mapping. A compari- 
son between velocity-resolved reverberation and (contemporaneous) 
interferometry in the same object would be particularly promising: 
reverberation mapping can unambiguously distinguish rotation from 
inflow and outflow, interferometry measures the orientation on the 
sky, and both techniques independently probe size and structure. The 
thick disk structure that we infer in 3C 273 could have a physical origin 
either in the inflowing surface layer of a geometrically thick accretion 
disk** or in a rotating wind launched from the surface of a thin disk”. 
Distinguishing these scenarios will require detailed model predic- 
tions, because strong outflows can still exhibit dominant signatures 
of rotation*®. 

The combination of low inclination angles, large opening angles, 
a small radius and the alignment of our baselines with the rotation 
axis reduces the observed phase amplitude (roughly 0.3°) from the 
predicted value (more than 1°). Nonetheless, we have shown that the 
sensitivity of GRAVITY allows measurements of the size and struc- 
ture of the BLR. Future observations could aim to measure a rotation 
curve and search for additional (for example, outflow) components 
and structural variability. With current instrumentation, spectro- 
interferometry is limited to the brightest AGN (with near-infrared 
magnitudes of less than 12 and visible magnitudes of less than 15). 
With an on-going observing programme, we plan to measure the prop- 
erties of line and torus emission regions in a small sample of quasars 
and Seyfert galaxies!°. The angular size of each component scales with 
the observed optical flux®, making interferometry particularly well 
suited to exploring the physical origin of the broad-line region and 
to measuring black-hole masses in luminous quasars such as 3C 273, 
which are more representative of the supermassive black holes found 
in large samples out to high redshift. 
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METHODS 


Observations. The observations were taken at the VLTI in Chile using the 
second-generation GRAVITY instrument” and the four 8-m unit telescopes. We 
chose the medium-resolution mode of GRAVITY with 90 independent spectral 
elements (\/A\ 500) dispersed across 210 pixels. All data were obtained in 
single-field on-axis, combined polarization mode. 

Each observation followed the same sequence. Each unit telescope locked its 
adaptive optics (MACAO) module on 3C 273. Once the adaptive optics loops 
were closed, the telescope beams were coarsely aligned on the VLTI laboratory 
camera IRIS. In a second step, the internal beam tracking of GRAVITY aligned 
the fringe-tracking and science fibres on the target. 

The science exposures contain ten 30-s integrations (NDIT = 10 and 

DIT = 30 s). Fringe tracking on faint sources such as 3C 273 (K~ 10, V= 13) is 
difficult, so we integrated deeply on-source, taking only occasional sky exposures, 
often at the end of the integration. In July 2017, January 2018 and May 2018, 
interferometric calibrators were also observed (although not required for differ- 
ential phase measurements). Extended Data Table 1 lists the observing nights, the 
integration time, the seeing (as provided by the DIMM seeing monitor) and the 
coherence time. 
Data reduction. We used the standard GRAVITY pipeline to process the 
data!°!, Each exposure consists of ten science frames, which are averaged after 
processing. Each frame is corrected for background bias by subtracting the 
closest sky exposure. The detector-noise bias is calculated from dark exposures 
with the same settings as the science exposures. Flat-field and wavelength cali- 
bration were done on the internal calibration source. The effective wavelength 
for each spectral element was calibrated by modulating the fibre lengths of the 
fringe-tracking and science channels and using the internal laser metrology as 
a wavelength reference. 

The science data were reduced using a pixel-to-visibility matrix*”. This matrix 
represents the matrix-encoded instrument transfer function, which includes the 
relative throughput, coherence, phase shift and cross-talk for each pixel. Applying 
the matrix to the detector frames yields the instrument-calibrated complex 
visibilities. In the next step, the complex visibilities from the science channel were 
phase-referenced to those from the fringe-tracking channel using the laser metrol- 
ogy. The effective optical path difference for each spectral channel was calculated 
from the delay measured by the laser metrology and the differential dispersion 
between the wavelengths of the laser and of the channel. This calculation yields 
phase-referenced complex visibilities. The GRAVITY pipeline removes a mean 
and slope from the raw visibility phase calculated using all wavelength channels 
to create a differential phase on each baseline. 

We used an alternative method (developed for VLTI/AMBER*) and computed 

the differential phase of each channel as in equation (7) in ref. **, which ignores 
the work channel when calculating the mean and slope. This method produces 
consistent results but improves our phase errors, typically by about 10%-20%. 
To account for the observatory transfer function (coherence loss due to vibra- 
tions, uncorrected atmosphere, birefringence, and so on), it is common practice 
to observe a calibrator star close to the science target. We did this for three epochs 
to obtain continuum visibilities. 
Data processing. For each exposure, we further removed broad instrumental 
shapes in the differential phases. This was done by convolving the differential 
phase versus the wavelength from each baseline with a Gaussian with a FWHM of 
24 pixels, and then dividing the actual data by this smoothed version. The width 
was chosen to flatten the differential phases without affecting their behaviour near 
the Paa line. After flattening, no calibrator shows any significant differential phase 
signature in any part of the spectrum, down to limits of less than 0.1°. Therefore, 
no further instrumental bias needs to be removed from the 3C 273 data. 

From each night, we selected exposures for which (i) the fringe tracking was 
working more than 80% of the time, and (ii) the root-mean-square of the differ- 
ential phase curves was less than 3° on all baselines. This selection removed 25% 
(26/104) of the 5-min exposures. Table 1 summarizes the data used in the analysis 
presented here. 

We then averaged the exposures in time for each night for each of the four 
epochs to reduce the phase noise. For the first three epochs, the changes in the 
coordinates (u, v) of the visibility domain for each baseline are small during inte- 
grations of less than 1 h. In May 2018, the integrations were longer. We checked for 
differences between the first and second half of each night, and for the expected 
changes in phase signatures in the best-fitting model (see below). In both cases, the 
changes are smaller than our errors and so we consider it safe to time-average per 
baseline at all epochs. The weights for each exposure and baseline were calculated 
from the root-mean-square of the phase noise over a broad spectral region cen- 
tred on the Paa line (2.06-2.28 1m). The averaged differential phase curves have 
a residual scatter of 0.2°-0.3° in each epoch. Individual 5-min exposures typically 
reached 0.7°-1.0°. In each epoch, the precision reached is a factor of roughly ten 
higher than previous spectro-interferometry of AGN”. 
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We extracted the spectral line profile in each exposure by summing the photo- 

metric flux of each of the four telescopes. We removed the shape of the instrument 
response by dividing the 3C 273 spectrum by that of a calibrator star. Spectral lines 
from the calibrator star were removed by first dividing its spectrum by templates 
from the NASA Infrared Telescope database**. For March and April 2018, for which 
no calibrator was taken, we used that from the January 2018 observation instead. 
Then the red continuum slope was removed before averaging the result over all 
exposures within each epoch. The line shape and strength were stable between 
epochs to within 5%. That scatter could be due to either intrinsic variability or 
systematic error in the spectral extraction. Conservatively, we assumed that the line 
profile and flux are constant and averaged the four epochs to get a single line pro- 
file. The errors are taken as the quadrature sum of the root-mean-square between 
epochs and the statistical errors on the measured photometric flux. 
Detection of a velocity gradient. As described in the main text, we first used 
the differential phase data to fit for a best centroid position at each spectral 
channel across the line. The model has an (x, y) position with a phase given by 
Ady=—2Tfiine(ujxi + vii), where the index i corresponds to the seven spectral 
channels used (in which the line intensity relative to the continuum f; > 0.35 with 
fiine=fi/(1 + f))) and j corresponds to each of the 24 baselines (6 x 4 epochs). This is 
the form in the marginally resolved limit, in which we can expand the exponential 
in the complex visibility and keep only the first-order term*>”°. In each channel 
we minimize the fit to the observed phases to find a best centroid position x;. The 
results are shown in Extended Data Fig. 2, in which the red and blue wavelength 
channels clearly cluster on opposite sides of the line. The 1a confidence intervals 
are shown as ellipses. 

To estimate the significance of the detection, we consider a null hypothesis of 
zero phase everywhere and compare this to a model with single centroid positions 
(x, y) and (—x, —y) in right ascension and declination on the blue and red sides 
of the line. We calculate the model phases as above for channels whose centroids 
appear to deviate significantly from zero (four red and four blue channels) and 
assign them zero phase elsewhere. From least-squares fitting, we find (x, y) = 
(—9.5, 6.8) +(1.1, 1.6) pias. The x? for the null hypothesis and model are 1,417 
and 1,308, respectively, with 960 data points. An F-test rejects the x= y=0 null 
hypothesis with a P value of 10-1”, corresponding to more than 8c. The spectral 
channels are more finely sampled than their resolution element, so neighbouring 
channels are correlated. To estimate the effect of this correlation, we repeat the 
above test using half of the channels and find a P value for the null hypothesis of 
less than 10-8, or a detection significance of more than 5.50. 

The bluest channel in Extended Data Fig. 2 moves to the south, along the jet. 
Its phase agrees with those of other blue channels for the three off-jet baselines 
(Fig. 1a). However, it shows a —0.2° phase in the average of the other three base- 
lines. The significance is only 1o-20, and we do not interpret it further here. If 
detected in future observations, the alignment with the large-scale jet direction 
could be a signature of outflowing gas at high velocity. 

At certain baselines and spectral channels we see apparently systematic features 

that can extend over two to three channels, away from any spectral lines. None of 
these reaches close to the signal-to-noise ratio of the signature studied here. This 
is true for all GRAVITY AGN and calibrator data examined. 
BLR model description and fitting procedure. We adopt a phenomenological 
model of the Pac emitting region to interpret our data, closely following previous 
work'*!5, The model comprises a large number of non-interacting test particles 
on circular orbits around the central black hole under the sole influence of gravity. 
The particles represent dense, low-filling-factor, line-emitting gas clouds. Their 
distances from the black hole are 


r= Ret FRep + g(1—F)B’Rerr 


where Rs = 2GMgy/c’ is the Schwarzschild radius, Rgip is the mean radius, 
F= Ryin/Raxpe is the fractional inner radius, ( is the shape parameter and 
g=p(«x | 1/6, 1) is drawn randomly from a Gamma distribution: 


x°lexp(— x/0) 


p(s|a,0)=* 


No line-emitting clouds are present inside Rin (Ryin >> Rg for the viable models 
found for 3C 273), and their distribution is allowed to vary from a Gaussian dis- 
tribution concentrated around the mean (0 < {3 < 1) to an exponential (3= 1) or 
steeper (1 < @ < 2) profile concentrated at the inner radius. The angular distribu- 
tion is specified by a half-thickness 0,, and the clouds are placed at random posi- 
tions along their orbits. The structure is viewed at an inclination angle i and rotated 
in the sky plane by a position angle PA measured in degrees east of north. In total, 
the model as implemented has seven free parameters: Rgrp, F, 3, Mpu, 9, iand PA. 

Recent velocity-resolved reverberation mapping studies used additional param- 
eters describing inflow and outflow, anisotropic emission and asymmetry in the 
angular distribution?*”*, Here we choose the minimal model required by our data. 
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Rotation explains the velocity gradient perpendicular to the jet axis, an inner radius 
is justified by cut-offs in the spectrum at roughly +4,000 km s~!, and steep radial 
and thick angular distributions can produce the observed line shape. Although 
we find a satisfactory fit (see below), including elliptical or radial motion would 
probably change our inferred parameters. In particular, additional model com- 
ponents that add disorder could potentially allow models with larger Rg and 
Mgpx to fit the data. 

Assuming that emitted Pac photons free-stream through the BLR once emitted, 
we conveniently obtain spectro-interferometric observables as sums over clouds 
binned by the observed wavelength Acts: 


Vv 


2 
Cc 


1 


a 


-1/2 
Abs = (144) f 4s) (2) 
Cc r 


emit 


with an emitted wavelength Amit, total velocity v and line-of-sight velocity v,. We 
have included the relativistic and transverse Doppler shift and the gravitational 
redshift, because they could affect the line shape of the emission*”. We note that 
equation (18) of ref. ' neglects the transverse Doppler shift, which is of the same 
order as the gravitational redshift for orbital motion. We also note that in equation (2), 
v, < 0 corresponds to motion towards the observer (blueshift), in keeping with the 
standard radial-velocity convention. 

The spectral line shape is then found by summing the clouds in bins of observed 
wavelength. We account for the shape of the GRAVITY line spread function in 
medium resolution by binning at higher spectral resolution before convolving 
with a Gaussian with a FWHM of 4 nm. We then normalize the line strength so 
that it matches the observed strength at the peak and shift it to the observed central 
wavelength. The scaling and shift are both fixed in the analysis. 

We model the differential phase of the BLR relative to a continuum that is 
assumed to be symmetric, as implied by observed closure phases in 3C 273 being 
less than 1°. Because the size of the region (less than 100 j1as) is much less than 
the VLTI imaging resolution (about 3 mas), each cloud contributes a phase of 
OMA, U, V) = —2Tf(A)/[1 — flA)](uxx + vy;) in radians, where f(A) is the line inten- 
sity relative to the continuum and (u, v) is the baseline separation. The Fourier 
transform is linear, so we can find the total phase for each baseline by summing 
the individual phases in wavelength bins, using the same procedure as above to 
account for the instrument spectral point spread function. 

We fit the seven-parameter model to all observed 3C 273 spectral line and 
phase curves simultaneously (40 wavelength channels for the time-averaged 
spectrum and 24 baselines; data shown in Extended Data Fig. 1). The number 
of wavelength channels is chosen to fully cover the line profile as well as a 
small off-region, because the inner radius that we infer depends on the maxi- 
mum observed radial velocity in the tails of the line. We use 2 x 10° clouds in 
the model, the minimum number so that the likelihood does not vary signif- 
icantly between random instantiations. We use Bayesian statistics to measure 
confidence intervals on the model parameters. The priors are uniform over 
the following intervals: log(Rgrr/pas) € (0, 4), F € (0, 1), 8 € (0, 2), log(Mpu/ 
Mo) € (6, 10), 85 € (0°, 90°), i € (0°, 90°) and PA € (0°, 360°). The posterior 
is sampled using the emcee Markov chain Monte Carlo code!®, with a joint 
likelihood including both the spectral and differential phase data across all 
spectral channels. 

The sampling is run with 1,000 ‘walkers’ independently exploring the param- 
eter space, and converges within about 800 trials (a total of 8 x 10° samples of the 
likelihood). We checked the convergence by comparing confidence intervals taken 
from well-separated trials and by starting the fit from several initial guesses far 
from the final distribution. We report 90% confidence intervals as the 5th and 95th 
percentiles of samples in the one-dimensional distribution over each parameter. 
A corner plot of the full set of two- and one-dimensional distributions is shown 
in Extended Data Fig. 3. 

Inferred BLR size compared with reverberation mapping. Previous reverber- 
ation mapping studies using HB and Hy have found time lags of 260-380 light 
days”, or a lower limit of 100 light days’. Our measured value of 145 + 35 light 
days using Pa is within this range, but roughly half what is typically assumed. The 
wide range in the reported values is due to the difficulty of reverberation mapping 
measurements for luminous quasars that are equatorial on the sky, such as 3C 273. 
The light curves must be combined across long seasonal observing gaps, leading 
to aliasing and other problems with irregular sampling. We believe that this is the 
main contributor to the discrepancy. In addition, about 25% of this difference can 
be explained by the half-a-magnitude higher source luminosity during the previous 


observations. Other contributing factors could be a real difference between the Pax 
and Balmer line-emission regions, but this seems unlikely because these lines share 
an upper atomic level. Our measured line width is slightly smaller than what has 
been measured for the other lines, implying a larger radius, but this could also be 
the result of complex kinematics that were not accounted for. 

As discussed in the main text, our measured size is consistent with the other 
known properties of 3C 273. It is a factor of about three smaller than the hot dust 
continuum, similar to what is seen in other sources”!. However, it is well below the 
prediction from the radius—luminosity relation based on H8 (260-380 light days)**. 
Other recent reverberation mapping work on samples of more luminous sources 
find smaller source sizes and a flatter radius—luminosity relation**°, which would 
agree with our size estimate. Additional interferometry measurements, well suited 
to luminous quasars, could help to study this further. 

Limits on radial inflow and outflow. With only detections of differential phases, 
spectro-interferometry cannot distinguish between rotational and ordered radial 
motion (inflow or outflow)!>. Here we see that the velocity gradient is nearly 
perpendicular to the large-scale jet, and conclude that it indicates rotation. In a 
mixed model in which rotation and inflow/outflow are both present, the position 
angle of the velocity gradient changes depending on their relative strength. As a 
simple estimate of the allowed fraction of inflow and outflow, we assume that the 
true rotation axis aligns with the 3C 273 jet. The observed offset between the two 
would then be due to outflow, which moves the net velocity gradient towards the 
jet direction. Fitting our previous BLR model with a fixed position angle constrains 
the fraction of radial outflow at the local escape speed to be at most about 25%. 
This limit is qualitative and, for example, changes depending on the true position 
angle of the rotation axis and the implementation of the outflow model. In particu- 
lar, disk-wind models are often rotation-dominated at low velocity”? and would 
probably not be subject to this constraint. 

Size comparison with previous 3C 273 near-infrared spectro-interferometry. 
Tentative evidence has been found” for a very large BLR (Rgir* 200 tas) using 
VLTI/AMBER. This result came from differential visibility amplitude data, in which 
a claimed decrease in the wavelength-dependent change in the interferometric 
amplitude implies a larger image size of the line than of the continuum*>**, We 
show the differential amplitude data from our GRAVITY observations in Extended 
Data Fig. 4, averaged over our two longest baselines (UT4-1 and UT3-1) and over 
all epochs. At ten-times-better precision, we see no sign of the claimed decrease!” 
at the line. Instead, we see a clear increase in the amplitude at the line, showing 
that it is more compact than the hot dust continuum emission’, Raz < 150 j1as. 
This compact size agrees with the small Rg: that we inferred independently from 
modelling the Pax spectral line shape and the differential visibility phases as 
described above. 


Data availability 
The data were obtained at the VLTI of the European Southern Observatory (ESO), 
Paranal, Chile, and are available on the ESO archive (http://archive.eso.org/eso/ 
eso_archive_main.html) under programme IDs 099.B-0606, 0100.B-0582 and 
0101.B-0255. 
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Extended Data Fig. 1 | Differential phases and u-v coverage. our data (in units of millions of \, MA), with observed points 


a, Differential phase curves (coloured points with 1o error bars) on the six _ (in colour) mirrored in grey. Note the close alignment between all 
VLTI baselines (rows) at four epochs (columns) and time-averaged 3C 273 _ baselines, particularly those without UT4, with the position angle of the 
Paa line profile (black points, identical in each panel). The best-fitting large-scale radio jet of 3C 273. 

BLR model to all data are shown as solid lines. b, u-v coverage for all of 
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Extended Data Fig. 2 | Observed centroid positions in several of their larger errors. Given the relatively low signal-to-noise ratio in each 
wavelength channels. Best-fitting centroids to the differential phase channel, we cannot measure the radial velocity as a function of position 
data in each wavelength channel are shown as in Fig. 1, but with contour (for example, the rotation curve). ADec, declination offset; ARA, right 


ellipses containing 68% of the probability density. In addition, the extremal _ ascension offset. 
points to the blue (on the jet axis) and red are not shown in Fig. 1, because 
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Extended Data Fig. 3 | Corner plot of the BLR model parameters. parameter values at which the highest likelihood was obtained, the dashed 
One- and two-dimensional probability density distributions from fitting lines in the one-dimensional distributions are the 5% and 95% quantiles, 
the seven-parameter BLR model to the spectral line profile and differential | and contours are at 1a, 20 and 3a. 

phase data for 3C 273 obtained from GRAVITY. The blue points show the 
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differential visibility amplitude (blue, ‘visamp’) for 3C 273 over all epochs of a large region size in 3C 273”, and is consistent with the compact size 
and between the two longest baselines (UT4-1 and UT3-1). Error bars are (Rgir* 50 j1as) that we find independently from modelling interferometric 
lo. The amplitude increases at the spectral line (black), demonstrating phase and spectral-line data. 


that the hot dust continuum (which has a radius of about 150 j1as) is 
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Extended Data Table 1 | GRAVITY data used for this work 


LETTER 


Exposure time on | Seeing (“) Coherence time 

Resolution/Polarisation | 3C 273 (min) Tauo (ms) 
2017 Jul 07 MED/COMBINED 40 0.44 — 0.77 4.6-6.5 
MED/COMBINED 35 0.45 — 0.58 2.8 —3.8 
2018 Jan 08 MED/COMBINED 40 0.44 — 0.59 6.9—9.0 

2018 Mar 29 MED/COMBINED 25 0.41 — 0.48 11.6 - 15.0 
2018 Apr 01 MED/COMBINED 30 0.59 — 1.0 3.7-4.7 
2018 Apr 02 MED/COMBINED 30 0.46 — 0.85 3.2-4.9 
2018 May 29 MED/COMBINED 100 0.52 —0.75 3.1-5.8 
2018 May 30 MED/COMBINED 90 0.48 — 0.68 2.9-4.1 
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Quantum control of surface acoustic- wave phonons 
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One of the hallmarks of quantum physics is the generation of 
non-classical quantum states and superpositions, which has been 
demonstrated in several quantum systems, including ions, solid- 
state qubits and photons. However, only indirect demonstrations 
of non-classical states have been achieved in mechanical systems, 
despite the scientific appeal and technical utility of such a 
capability’, including in quantum sensing, computation and 
communication applications. This is due in part to the highly 
linear response of most mechanical systems, which makes 
quantum operations difficult, as well as their characteristically low 
frequencies, which hinder access to the quantum ground state*~’. 
Here we demonstrate full quantum control of the mechanical state 
of a macroscale mechanical resonator. We strongly couple a surface 
acoustic-wave® resonator to a superconducting qubit, using the 
qubit to control and measure quantum states in the mechanical 
resonator. We generate a non-classical superposition of the zero- 
and one-phonon Fock states and map this and other states using 
Wigner tomography® "+. Such precise, programmable quantum 
control is essential to a range of applications of surface acoustic 
waves in the quantum limit, including the coupling of disparate 
quantum systems!*!°, 

Linear resonant systems are traditionally challenging to control at the 
level of single quanta because they are always in the correspondence 
limit!’, where quantum behaviour is indistinguishable from classical 
motion. The recent advent of engineered quantum devices in the form 
of qubits has enabled full quantum control over some linear systems, 
in particular electromagnetic resonators!*!*. A number of experi- 
ments have demonstrated that qubits may provide similar control 
over mechanical degrees of freedom, including qubits coupled to bulk 
acoustic modes*”, surface acoustic waves (SAWs)!?-?! and flexural 
modes in suspended beams”*~”». In addition, several experiments have 
studied entanglement between remote mechanical modes generated via 
heralding measurements!*” and reservoir engineering’’. Of particu- 
lar note are experiments in which a superconducting qubit is coupled 
via a piezoelectric material to a microwave-frequency bulk acoustic 
mode’’, where the ground state can be achieved at moderate cryogenic 
temperatures; such experiments include controlled vacuum Rabi swaps 
between the qubit and the mechanical mode*’. However, the level of 
quantum control and measurement has been limited by the difficulty 
in engineering a single mechanical mode with sufficient coupling and 
quantum state lifetime. More advanced operations, such as synthesizing 
arbitrary acoustic quantum states and measuring those states using 
Wigner tomography, remain a challenge. Here we report an impor- 
tant advance in the level of quantum control of a mechanical device, 
where we couple a superconducting qubit to a microwave-frequency 
SAW resonance, demonstrating ground- state operation, vacuum Rabi 
swaps between the qubit and the acoustic mode, and the synthesis of 
mechanical Fock states as well as a Fock state superposition. We map 
out the Wigner function for these mechanical states using qubit-based 
Wigner tomography. We note that a similar achievement has been 
recently reported in an experiment coupling a superconducting qubit 
to a bulk acoustic mode”’. 


The device that we use for this experiment is shown in Fig. 1. 
The superconducting qubit is a frequency-tunable planar trans- 
mon?”#!, connected to the SAW device through a tunable inductor 
network that provides electronic control*” of the coupling strength 
go (see Supplementary Information). Qubit rotations about the X 
and Y axes in the Bloch sphere representation are performed using 
pulses on the microwave (XY) line, and Z-axis rotations are achieved 
by application of a flux bias current on the frequency-control (Z) 
line. We measure the qubit state using a dispersively coupled readout 
resonator (see Supplementary Information). The superconducting 
qubit is fabricated on a sapphire substrate with standard techniques 
(see Supplementary Information). The SAW resonator is fabricated 
separately on a lithium niobate substrate, a strong piezoelectric mate- 
rial commonly used for SAW devices®. The SAW resonator com- 
prises an interdigital transducer placed between two Bragg mirrors, 
designed to support a single SAW resonance in the mirror stop band® 
(see Supplementary Information). The SAW wavelength \ is set by the 
period of the metal lines that constitute the resonator; here, A= 1 jum, 
which corresponds to a frequency of 4.0 GHz. At the experiment 
temperature, about 10 mK, both the SAWs and the qubit should be 
in their quantum ground states. The electromechanical properties of 
the SAW resonator are modelled using an equivalent electrical circuit 
with a complex, frequency-dependent acoustic admittance® Y,(w) con- 
nected in parallel with an interdigital capacitance C,=0.75 pF The 
admittance includes the complete response of the SAW transducer 
and the interaction of the SAW with the mirrors. The strong electro- 
mechanical coupling coefficient of lithium niobate makes it feasible 
to strongly couple the SAW resonance to a standard transmon-style 
qubit (see Supplementary Information). The separate qubit and SAW- 
resonator chips are connected together in a flip-chip assembly, in which 
the lithium niobate chip is inverted, aligned and affixed to the sapphire 
chip, and are separated vertically by about 7 1m (see Supplementary 
Information). Coupling between the two chips is achieved using two 
overlaid planar inductors, one on each chip. The coupling strength is 
controlled using a radio-frequency superconducting quantum interfer- 
ence device (SQUID) tunable coupler*’, where an externally controlled 
flux bias & controls the path of the qubit current. We note that the flip- 
chip technique used here enables a wide range of future hybrid combi- 
nations of different substrate types with superconducting or other types 
of qubits; as shown below, the coherence of the qubit in this experiment 
was not affected by this approach. 

We use qubit measurements to evaluate the SAW resonator. The 
qubit itself has a lifetime of T; + 20 ts and a Ramsey lifetime of 
T2,Ramsey © 2 [1s over the frequency range 3.5-4.5 GHz, measured with 
the coupling go set to zero (see Supplementary Information). Adjusting 
&o away from zero shortens the qubit lifetime and makes it strongly 
frequency-dependent, as the transducer converts electromagnetic 
energy from the qubit into acoustic waves. In Fig. 2, we demonstrate 
this with |go|/27 set to 2.3 0.1 MHz (all uncertainties are one standard 
deviation), where acoustic loss is the dominant decay channel for the 
qubit. We measure the qubit lifetime T, as a function of qubit fre- 
QUENCY, Wye/2T, and use it to obtain the quality factor Q=wgeT; and 
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Fig. 1 | Device description. a, False-colour optical photograph of the qubit 
(left; blue) and SAW resonator (right; red, transducer; orange, mirrors) 
connected via a tunable coupler (centre; purple). The qubit and coupler are 
built on a sapphire substrate, whereas the SAW resonator is on a separate 
lithium niobate substrate. In the figure, the device is viewed from below 
(through the transparent sapphire substrate). The inset shows a false- 
colour scanning electron micrograph of the SAW resonator; red, upper 

left corner of the transducer; orange, mirror. b, Photograph showing the 
flip-chip assembly. Left, the complete device. Centre, the 6 mm x 6 mm 
sapphire chip. The qubit and coupler are visible near the centre of chip, 
with control wiring extending to the perimeter. Right, the 2mm x 4mm 
lithium niobate chip with the SAW resonator (red), connected to coupling 
inductors (horizontal lines). c, Circuit diagram. The microwave XY line 
excites the qubit, the Z line controls the qubit frequency, the G line controls 
the coupler, and the D line coherently displaces the resonator state. The 
qubit, coupler and control lines are on one plane. The SAW resonator is on 
a separate chip, represented by the small grey rectangle. Overlaid inductors 
are mutually coupled. d, Qubit-resonator coupling, go/27, calculated for 

a range of coupler flux bias values &, where &p = h/2e is the magnetic flux 
quantum (h, Planck constant), using the linear circuit model shown in c 
with the experimental parameters (see Supplementary Information). 


the corresponding loss 1/Q. We compare our measurements to the 
results of a numerical model® based on the SAW resonator design with 
parameters fine-tuned to reproduce the frequency response observed in 
the qubit loss (see Supplementary Information). The SAW transducer 
itself can efficiently emit phonons over a wide range of frequencies, 
roughly from 3.8 GHz to 4.1 GHz, owing to its small number of finger 
pairs® (20 pairs). The SAW mirror reflects acoustic waves efficiently in 
the mirror stop band from 3.96 GHz to 4.04 GHz. The resultant inter- 
ference frustrates the transducer emission except when a resonance 
condition is met, in this case at the single SAW resonance frequency 
of w,/2% = 3.985 GHz. The resonator admittance near that resonance 
can be approximated by an equivalent resonant electrical circuit, which 
constitutes the Butterworth-van Dyke model®. Outside the mirror stop 
band, the mirror reflection decreases rapidly, and the transducer is 
free to emit travelling phonons. The qubit sees this as increased loss, 
especially from 3.85 GHz to 3.90 GHz, where the transducer is most 
efficient. The ripples in the out-of-band mirror reflection arise from 
the finite extent of each mirror (500 lines). These features are clearly 
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Fig. 2 | Characterization and modelling of SAW admittance. 

a, Measured qubit loss 1/Q as a function of qubit frequency wye/2T. Blue, 
|gol/2x = 2.3 + 0.1 MHz. The purple line corresponds to minimized gp. 
Each data point represents 50,000-100,000 measurements. b, Real part 
of the acoustic admittance of the SAW resonator, Re(Y,), calculated with 
a numerical model (see Supplementary Information). The red solid line 
shows the admittance of the full resonator model; the SAW resonance 

is the large peak at 3.985 GHz. The pink dashed line is the admittance 
calculated for the transducer alone, without the mirror structure. 

c, Magnitude of the acoustic reflection coefficient of the mirror model. 


displayed in the measured qubit loss. The qubit also weakly couples 
to unidentified resonances near 3.8 GHz. The SAW resonance at 
3.985 GHz can resonantly and rapidly exchange energy with the qubit. 
In subsequent experiments, we avoid unwanted qubit loss by usually 
keeping the coupling small and only increasing it when deliberately 
interacting with the SAW resonance. 

We now focus on the interaction between the single SAW resonance 
and the qubit. In Fig. 3a, we illustrate the full range of qubit coupling to 
the resonance, determined using spectroscopic measurements of the 
qubit. We observe a maximum coupling of |go|/(2x) =7.3 40.1 MHz, 
which is equal to half of the avoided-crossing splitting. The ratio of 
the maximum to the minimum coupling strength is measured to be 
at least 300 (see Supplementary Information). Figure 3c shows time- 
domain Rabi swapping ofa single excitation between the qubit and the 
mechanical mode, which represents a photon—phonon exchange in 
each half-oscillation. A resonant swap operation is executed by setting 
the qubit frequency to w, and turning on the coupling for approximately 
37 ns. The number and amplitude of the swaps is primarily limited by 
the resonator lifetime, T),. 

We show the characterization results for the single-phonon proper- 
ties of the resonator in Fig. 3c. We prepare a quantum state in the qubit, 
swap it into the resonator, wait for a delay time t, swap the state back 
into the qubit, and measure the qubit. The decay of the phonon is con- 
sistent with an energy lifetime of T|,= 148 +1 ns and a dephasing time 
of T>,=293 + 1 ns, where the ratio T,/T), & 2 is consistent with little 
to no additional phase decoherence, as expected for a harmonic oscil- 
lator. The T>, experiment involves generating a quantum superposition 
of the resonator phonon Fock states |0) and |1) by performing a Rabi 
swap from a qubit in the state (|g) —i |e)) /-/2, where |g) and |e) are 
the qubit ground and excited states, respectively. The probabilities oscil- 
late at the idle detuning frequency A/2x =53 MHz, exhibiting inter- 
ference between the resonator state and the qubit tomography pulses. 
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Fig. 3 | Qubit interaction with a single mechanical mode. a, Qubit 
spectroscopy under three different coupler settings: top, minimum 
coupling; middle, moderate coupling, |go|/(27) = 2.3 + 0.1 MHz; bottom, 
maximum coupling, |go|/2x = 7.3 + 0.1 MHz. The probability P, of the 
excited qubit state |e) is plotted (colour scale as in b), measured with the 
qubit biased at frequency w, + A and driven with a 500-ns-long pulse at 
frequency v. Each pixel represents 1,000 repeated measurements. 

b, Results of Rabi-swap experiment. The qubit is excited to |e), and then it 
is biased to frequency w, + A while the coupling strength is maximized. 
The qubit and resonator interact for a time 7, and the qubit state is then 
measured. We plot the probability P, versus the detuning A and the 
interaction time 7. Each pixel represents 3,000 repeated measurements. 

c, Results of single-phonon experiments using the pulse sequence shown 
in the inset. Top, measurement of the resonator lifetime, T),. The qubit is 
excited to |e) and that excitation is swapped into the resonator. Following a 
delay time t, the state is swapped back into the qubit, and the qubit is 
measured. Each data point represents 10,000 repetitions. Bottom, 
measurement of the dephasing time, T>,. The qubit is excited to 

(|g) — ile))/./2, which is swapped into the resonator; after a delay time 

t the state is swapped back into the qubit. We then conduct qubit 
tomography, using a second qubit pulse (blue, X;/2; red, Yx/23 

see Supplementary Information) followed by the qubit measurement. Each 
data point represents 20,000 repetitions. 


We attempt to create the higher Fock state |2) in the SAW resona- 
tor by exciting the qubit and swapping its excitation into the resona- 
tor two times. We show the result in Fig. 4. The experiment is limited 
by the resonator lifetime T),, which is comparable to the duration of 
the pulse sequence used to generate |2), about 100 ns. We do observe 
higher-frequency oscillations in the initial interaction, as expected. 
The experimental result is in excellent agreement with a numerical 
master-equation model, which is fitted to the experiment by adjusting 
the initial qubit and resonator states. The resonator state is closest to 
|2) after an interaction time of 7= 26 ns. At that time, the resonator 
state calculated by the model is a statistical mixture of 47.3% |2), 38.2% 
|1) and 14.5% |0), with the unwanted lower states appearing owing to 
decay during state preparation. 

We now characterize the quantum state of the resonator in greater 
detail. Verifying that the resonator is indeed in its ground state is an 
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Fig. 4 | Generation of the |2) state. a, Qubit evolution, nominally starting 
with the qubit in |e) and the resonator in |1). Black points, experimental 
results; grey line, numerical model; red dashed line, time when the 
resonator state is closest to |2). The inset shows the pulse sequence used 

in the experiment: the qubit is excited to |e), which is swapped into the 
resonator; the qubit is again excited to |e), and then it interacts with the 
resonator for time 7. Each point represents 3,000 repetitions. b, The 
phonon number probability distribution, P,,, calculated by the model at the 
time indicated by the red dashed line in a (r= 26 ns). 


important step in evaluating its quantum behaviour. We examine the 
residual thermal populations in the qubit and resonator excited states, 
|e) and |1), respectively, using a Rabi population measurement tech- 
nique”*? (see Supplementary Information). Driven transitions between 
|e) and the second excited qubit state, |f), are used to quantify the |e) 
population by measuring the amplitudes of Rabi-like oscillations. The 
experimental results are shown in Fig. 5a, where we vary the amplitude 
of a microwave pulse that drives e—f transitions. In the left panel, we 
show the result of probing the ground-state population of the qubit; 
the large-amplitude oscillations show near-unity initial ground-state 
population. In the right panel, we probe the excited-state population, 
which is much smaller. We calculate the excited-state population from 
the amplitudes of these oscillations (see Supplementary Information). 
When performing the experiment on the qubit alone, we observe an 
excited-state population of 0.0169 + 0.0002. To assess the thermal pop- 
ulation of the resonator, we first execute a resonant-swap operation, 
and then we conduct the experiment again. The swap exchanges the 
small excited-state populations in the resonator and the qubit. In this 
case, we observe an excited-state population of 0.0049 + 0.0002, which 
we interpret as an upper bound on the excited-state population of the 
resonator’. 

The level of control achievable in this experiment allows us to con- 
trollably generate the resonator states |0), |1), ({0) + |1)) /-/2 and, to 
a lesser extent, |2). We prepare these resonator states deterministically, 
by exciting the qubit and transferring energy into the resonator with 
resonant swaps. We use Wigner tomography to determine the fidelities 
of these quantum states! (see Supplementary Information), examining 
the three lowest-energy states in detail. Following state preparation, we 
measure the Wigner function W(q) of the resonator by using the qubit 
to measure the parity of the resonator states at different complex dis- 
placements a in the resonator phase space (see Supplementary 
Information). The required displacements a are created by driving the 
resonator with a resonant Gaussian microwave pulse applied to a con- 
trol line (see Fig. 1c). During the pulse, the coupling is turned off, and 
the qubit is detuned above the resonator by A/21 = 400 MHz. 

With the qubit initially in its ground state |g), we allow the qubit and 
resonator to resonantly interact for a time 7, and then we measure the 
qubit. An example is shown in Fig. 5b. The plot of the qubit state as a 
function of delay 7 contains information about the displaced-resonator 
state. We fit the experimental results with a numerical master-equation 
model to deduce the phonon number (n) distribution of the displaced 
resonator, P,,. We then calculate the Wigner function, which is propor- 
tional to the phonon number parity’ (see Supplementary Information). 
We repeat the experiment for many values of a to map out the Wigner 
function. The results are displayed in Fig. 5d, along with the prediction 
of the numerical model using the same pulse sequence. The value of 
each pixel is determined independently. We then convert each experi- 
mental W(q) into a density matrix p (see Supplementary Information). 
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Fig. 5 | Resonator state characterization. a, Results of Rabi population 
measurement’, performed to determine the steady-state |e) qubit 
population by driving transitions between |e) and |f) (see Supplementary 
Information). The sequences used to probe the ground-state (left; 10,000 
repetitions per data point) and excited-state (right; 200,000 repetitions per 
point) populations are applied to the equilibrium qubit state (blue) and the 
qubit after swapping with the resonator (red)’. The e-f pulse amplitude is 
normalized to the amplitude that swaps |e) and |f), and a negative pulse 
amplitude means that the pulse phase is 7. b, Example Wigner tomography 
result showing the evolution of the qubit as it interacts with a displaced- 
resonator |1) state (black points). The red line is a fit (see c). The inset 
shows the synthesis of the mechanical state, which—for states other than 
the ground state—is indicated by the sequence in parantheses, and the 
pulse sequence used in Wigner tomography. The resonator state is 
displaced with coherent amplitude —a. The qubit interacts with the 
displaced resonator state for a time 7 before it is measured. Each point 
represents 3,000 repetitions. c, Example fit of the phonon number 
distribution P,, (statistical uncertainty, 0.004) obtained from the 
experimental results shown in b. d, Wigner functions W(a) of the SAW 
resonator quantum states. Top, experimental results; the (|0) + |1))//2 
Wigner function is rotated by 90° to compensate for relative phase 
accumulation during the pulse sequence’. Each pixel represents 255,000 
repetitions. Bottom, result of the numerical model. 


From the density matrices, we calculate the quantum state fidelities to 
the ideal states |y), F= ./ (4|_p |W) . We obtain F=0.985 + 0.005 for 
0), F=0.858 + 0.007 for |0) and F=0.945 +0.006 for (0) + |1)) /-/2. 
The numerical model predicts similar fidelities: F= 0.998, 0.879 and 
0.962, respectively. These experiments would benefit from a longer 
phonon lifetime T), and larger coupling strength. 

In conclusion, we demonstrate high-fidelity, on-demand synthesis 
of quantum states in a macroscale mechanical resonator and charac- 
terize them with Wigner tomography. The primary limitation in these 
experiments is the phonon lifetime in combination with the maximum 
coupling strength. These could be improved substantially in future 
work, for example, with design and material changes in the mechanical 
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resonator and adjustments to the coupling circuit. Our demonstration 
involves a hybrid architecture incorporating a high-performance qubit 
with strong tunable coupling to SAWs. This scalable platform holds 
promise for future quantum acoustics experiments coupling stationary 
qubits to ‘flying’ qubits based on phonons. The technologies demon- 
strated here may also enable a wide range of experiments coupling 
superconducting circuits to diverse quantum systems, such as semi- 
conductor spin systems. 
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Creation and control of multi-phonon Fock states in 
a bulk acoustic- wave resonator 


Yiwen Chu!?*, Prashanta Kharel!?, Taekwan Yoon!, Luigi Frunzio!?, Peter T. Rakich!? & Robert J. Schoelkopf!** 


Quantum states of mechanical motion can be important resources 
for quantum information, metrology and studies of fundamental 
physics. Recent demonstrations of superconducting qubits coupled 
to acoustic resonators have opened up the possibility of performing 
quantum operations on macroscale motional modes!-°, which can 
act as long-lived quantum memories or transducers. In addition, 
they can potentially be used to test decoherence mechanisms in 
macroscale objects and other modifications to standard quantum 
theory*®. Many of these applications call for the ability to create 
and characterize complex quantum states, such as states with a well 
defined phonon number, also known as phonon Fock states. Such 
capabilities require fast quantum operations and long coherence 
times of the mechanical mode. Here we demonstrate the controlled 
generation of multi-phonon Fock states in a macroscale bulk 
acoustic-wave resonator. We also perform Wigner tomography 
and state reconstruction to highlight the quantum nature of the 
prepared states®. These demonstrations are made possible by the 
long coherence times of our acoustic resonator and our ability to 
selectively couple a superconducting qubit to individual phonon 
modes. Our work shows that circuit quantum acoustodynamics’ 
enables sophisticated quantum control of macroscale mechanical 
objects and opens up the possibility of using acoustic modes as 
quantum resources. 

Light and sound are two examples of familiar wave phenomena in 
the classical world. Until now, the field of quantum optics has exten- 
sively demonstrated the particle nature of light in quantum mechanics 
through the study of single photons and other non-Gaussian electro- 
magnetic states. The concept of particles of sound, or phonons, is used 
widely in solid state physics. However, the ability to create states of 
individual phonons has only been demonstrated in a few instances!38, 
whereas the complete quantum tomography of such states has only 
been achieved in a single trapped ion®. This disparity between elec- 
tromagnetic and acoustic degrees of freedom is largely due to sound 
propagating inside the complex and potentially lossy environment of a 
massive material, rather than in vacuum. Asa result, an open question 
remains: is it feasible to control and measure complex quantum states in 
the motion of a macroscale solid-state object, or what we usually think 
of as sound, analogously to what has been done with light? 

The relatively new field of quantum acoustics attempts to answer 
this question using a variety of optomechanical and electromechani- 
cal systems!»”9-!3, and one particularly promising approach is circuit 
quantum acoustodynamics (QAD)!”!3, In analogy to circuit quan- 
tum electrodynamics (QED), circuit QAD uses superconducting quan- 
tum circuits that operate at microwave frequencies to manipulate and 
measure mechanical resonators. Circuit QAD takes advantage of the 
strong interactions between mechanics and electromagnetism enabled 
by, for example, piezoelectricity. It also incorporates the nonlinearity 
provided by the Josephson junction, which is a crucial ingredient for 
creating non-Gaussian states of motion. In turn, the ability to create 
these states makes mechanical resonators useful as resources in quan- 
tum circuits, offering capabilities beyond those of electromagnetic res- 
onators. For example, mechanical transduction is a promising method 


for transferring quantum information between microwave circuits and 
other systems, such as optical light or spin qubits'*'°. Owing to the 
difference between the speeds of sound and light, an acoustic resonator 
is much more compact and well isolated than an electromagnetic one 
at the same frequency and provides many more independent modes 
that are individually addressable by a superconducting qubit. Such an 
architecture is desirable for simulating many-body quantum systems”"® 
and provides a highly hardware-efficient way of storing, protecting and 
manipulating quantum information using bosonic encodings!”"®. 
These examples show that by repurposing the toolbox of circuit QED 
through the similarities between light and sound, circuit QAD allows 
us to make use of the important differences between these quantum 
degrees of freedom. However, in order to access this toolbox, we first 
need to demonstrate that a circuit QAD system can be engineered to 
have the necessary mode structure, strong enough interactions and 
sufficient quantum coherence to create and characterize quantum states 
of motion. 

In this work, we experimentally prepare and perform full quantum 
tomography on Fock states of phonons and their superpositions inside 
a high-overtone bulk acoustic-wave resonator (HBAR). This is enabled 
by a robust new flip-chip device geometry that couples a superconduct- 
ing transmon qubit to the HBAR. This geometry allows us to optimize 
the design of the acoustic resonator and the qubit separately to extend 
phonon coherence while enhancing the selectivity of the coupling to a 
single mode. The combination of these improvements leads to a device 
that is deeper in the strong-coupling regime of circuit QAD, which is 
necessary for the generation and manipulation of more complex quan- 
tum states. We note that a similar demonstration using a supercon- 
ducting qubit and surface acoustic waves has been recently reported”. 

We now describe the motivation behind the design of our circuit 
QAD system in more detail. Figure 1a shows a schematic of our device, 
which we call the hBAR. The first important difference from our pre- 
vious device? is the flip-chip geometry, where the qubit and acoustic 
resonator are now on separate sapphire chips’. This simplifies the 
fabrication procedure and increases the yield of successful devices 
(see Supplementary Information) while allowing qubits and acoustic 
resonators to be individually tested before assembly. Second, the hBAR 
incorporates a plano-convex acoustic resonator that is fabricated using 
a simple, robust method and supports stable, transversely confined 
acoustic modes (see Supplementary Information). Because the meas- 
ured acoustic lifetime in the previous unstable resonator geometry* 
was consistent with being limited by diffraction loss, this modification 
to our device could considerably improve phonon coherence. Another 
important requirement is the ability to selectively couple the qubit to a 
single acoustic mode. This is partly achieved by the plano-convex reso- 
nator design, which allows us to control the frequency spacing between 
transverse modes. To further increase mode selectivity, the third 
improvement is the addition of an optimized transduction electrode 
to the qubit. The electrode was designed to match the strain profile of 
the fundamental Gaussian transverse mode of the acoustic resonator 
(see Supplementary Information). We point out that even though the 
acoustic resonator is not in physical contact with the electrode, the 
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Fig. 1 | The hBAR device and strong qubit-phonon coupling. a, Top- 
view (top) and side-view (bottom) schematics of the hBAR device (not 

to scale). The chip containing the acoustic resonator has six nominally 
identical resonators on its edges, with Al (grey) spacers deposited on them. 
The measured thickness of the Al spacer between the qubit and acoustic 
resonator (hg) is shown, but the actual spacing may be larger owing to 
imperfections in the flip-chip assembly. Other dimensions indicated are 
the diameters of the transducer electrode (d,), curved resonator surface 
(d.) and acoustic-mode waist (dm), along with the thicknesses of the AIN 
(hain; green) and sapphire substrate (h,; blue). b, Spectroscopy results of 
the transmon qubit near the /; acoustic mode, measured while varying the 
current in an external coil used to flux-tune the qubit frequency. 


electric field of the qubit extends across the gap between the two chips 
and through the AIN film, thus allowing piezoelectric transduction. 
We now experimentally show that the new design does indeed lead 
to improvements in the electro-mechanical coupling, acoustic-mode 
spectrum and coherence of our device. As in our previous work, the 
hBAR is measured using a standard circuit QED setup that allows flux- 
tuning of the qubit frequency using an externally applied magnetic field. 
Figure 1b shows qubit spectroscopy results near the /=/, and m, n=0 
mode of the hBAR, which reveal a single distinct anticrossing feature. 
Here / is the longitudinal mode number and m, n are the mode numbers 
of the Hermite-Gaussian-like transverse modes. |; + 466 corresponds 
to the highest-frequency longitudinal mode that is fully within the 
tunable range of the qubit, as indicated in Fig. 2, where we investi- 
gate the mode structure of the hBAR over several longitudinal free 
spectral ranges. Figure 2a shows the time dynamics of the qubit-pho- 
non interaction for different qubit frequencies, which reveals features 
indicative of vacuum Rabi oscillations that are spaced by the longi- 
tudinal free spectral range vpsp = 13.5 MHz. Each oscillatory feature 
corresponds to an anticrossing similar to the one shown in Fig. 1b. The 
Fourier transform of the data in Fig. 2a is shown in Fig. 2b and gives a 
qubit-phonon coupling rate of gy =27 x (350 +3) kHz. In addition to 
the dominant set of oscillations corresponding to the m, n =0 Gaussian 
modes, there are clear signatures of other acoustic modes in Fig. 2a, b, 
which correspond to higher-order transverse modes, as indicated by 
simulations (see Supplementary Information). However, the closest 
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observable higher-order mode is about 1 MHz away from the m, n=0 
mode and about ten times less strongly coupled to the qubit, while all 
other higher-order modes are at least five times less strongly coupled. 
(From now on, we use only the longitudinal mode number to represent 
the m, n=0 modes.) These results indicate that the hBAR is a good 
approximation of a system in which the qubit can be tuned to interact 
with a single acoustic mode at a time. 

We demonstrate the improvements in the coherence of our system by 
performing quantum operations on the phonon mode using the qubit. 
Using techniques described in our previous work’, we find that the 
phonon mode has a lifetime of T; = 64 +2 1s, a Ramsey decoherence 
time of T, =38 +2 1s and an echo decoherence time of T),.=45 +2 1s. 
On other devices, we measured the phonon lifetime to be as long as 
T;=113+4 ws. These coherence times are comparable to that of state- 
of-the art superconducting qubits and suggest that the plano-convex 
resonator design does indeed support much-longer-lived phonons. 
The qubit in this device has a T;) =7 +1 1s, which is similar to that of 
our previous device. As will be discussed later, we believe that these 
device parameters can be further improved through modifications of 
the materials, fabrication procedure and device geometry. 

The improvements presented above allow us to perform quantum 
operations on the phonon mode with a new level of sophistication, 
which we now illustrate by creating and measuring multi-phonon Fock 
states. We use a procedure for Fock state preparation that has previously 
only been demonstrated in electromagnetic systems”” (Fig. 3a). The 
experiment begins with the qubit set to a frequency V that is detuned 
by 6=—5 MHz from the target J; phonon mode at frequency 1. The 
qubit ideally starts out in the ground state |g), but in reality has a ther- 
mal population of 4%-8% in the excited state |e). The phonon modes, 
on the other hand, have been shown to be colder*. Therefore we first 
perform a swap operation between the qubit and the J, mode with 
frequency 12. This procedure effectively uses an additional acoustic 
mode to cool the qubit to an excited-state population of about 2%. The 
qubit is then excited with a 7 pulse and brought into resonance with 
the J, mode to transfer its energy into the acoustic resonator using a 
swap operation. This is repeated N times to climb up the Fock state 
ladder and ideally results in a state of N phonons, which is then probed 
by bringing the qubit and phonon to resonance for a variable time t and 
measuring the final qubit state. We note that this measurement proce- 
dure gives the total population in the qubit excited-state subspace of 
the joint system and traces over the resonator state. The resulting time 
dynamics of the population p,,n(t) for up to N=7 is shown in Fig. 3b. 
In Fig. 3c, we plot the Fourier transform of the data in Fig. 3b. As 
expected, we observe oscillations with a dominant frequency of 
2g =2 VNg,> which corresponds to the rate of energy exchange 
between the l |g, N) and |e, N — 1) states. 

To characterize the states that we have created more quantitatively, 
we extract the population in each phonon Fock state n after performing 
an N-phonon preparation. We do this by first simulating the expected 
time traces, Pe,n(t), assuming that the phonon mode is prepared in an 
ideal Fock state ranging from n = 1 to Max = 14. The independently 
measured value of go, along with the qubit and phonon decay and 
dephasing rates, are used in the simulations. Then, the experimental 
data for each N (Fig. 3b) are fitted to a weighted sum of the form 


= ahs nit) (1) 


where p,,n is then the population at |g, n) after performing an 
N-phonon preparation. The fit for each N is subject to the constraints 
Pan <1V nand Simp . <1. Finally, the rage in the zero- 
phonon state is calcalaed as Py iy = =1—SMinw p . Ideally, Pan=6n,n- 
As shown in Fig. 3d, we observe that the mate distribution of pop- 
ulations for each experiment is indeed peaked at n = N. However, 
the population in the nominally prepared state decreases with increas- 
ing N. We find that p;,; = 0.86, which is consistent with a simple esti- 
mate that takes into account the energy decay from the one-excitation 
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Fig. 2 | Mode structure of the hBAR. a, Time dynamics of the qubit- 
phonon interaction, determined by exciting the qubit and measuring 
its excited-state population after a variable delay as the qubit is flux- 
tuned (a.u., arbitrary units). b, Logarithm of the Fourier transform of 
the data shown in a. The qubit frequencies shown on the horizontal 
axis are determined from spectroscopy data taken at each applied flux. 
The three highest-frequency fundamental transverse modes (m, n = 0) 


manifold during a swap operation, which is dominated by the qubit 
decay rate, and the imperfect preparation of the qubit in |g). For larger 
N values, the state preparation may be affected by additional effects, 
such as off-resonant driving of the phonon mode during the qubit 7 
pulses, which could lead to excess population in the n > N states. We 


that are fully accessible by the qubit are shown and labelled with their 
longitudinal mode numbers. The white dashed lines indicate the 
hyperbolic dependence of the effective vacuum Rabi frequencies for the 
three most dominant modes in one free spectral range. The black dashed 
line indicates the value of 2go for the fundamental mode, which is at least a 
factor of five larger than the coupling rates to the other modes. 


also find that the largest source of potential error in extracting py,y 
comes from uncertainty in the system parameters that are used in 
simulating p,,,(t). In particular, slight drifts of the qubit frequency 
can result in a mismatch between the value of 2gp used in the simula- 
tions and the actual oscillation frequency of the vacuum Rabi data. An 
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Fig. 3 | Climbing the phonon Fock state ladder. a, Pulse sequence for the 
generation and measurement of phonon Fock states. T; = %/2gp is the 
duration of a swap operation in the one-excitation manifold. In the state 
preparation step, the duration of the kth swap is scaled to account for the 
coupling rate g, = Vk g, between the |g, k) and |e, k — 1) states. Pulses 
intended to excite the qubit from |g) to |e) are labelled with ‘7’. The qubit 
frequencies Vp, 1, and 1, are described in the text. b, Qubit excited-state 
population after interacting with the phonon for a time t following an 
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N-phonon preparation procedure. Black lines are fits used to extract the 
Fock state populations shown in d. c, Fourier transform of the data shown 
in b, obtained by subtracting the mean of each dataset in b and appending 
copies of the resultant final value to effectively smooth the Fourier 
transform. The black line corresponds to 2gy. d, Populations in the n Fock 
state, extracted from b. The numbers show the populations in n = N. Error 
bars indicate the result of changing the value of gy in the simulations of 
Pen(t) by +5 kHz (see main text). 
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Fig. 4 | Wigner tomography of non-classical states of motion. 

a-c, Measured Wigner functions of the prepared states |1), 

(]0) + |1))//2 and |2). Each grid point is a separate experiment with 
displacement by a phase-space amplitude a. d—f, Wigner functions of the 
density matrices reconstructed from a-c. g-i, Cuts of the ideal Wigner 


estimate of the effect of such miscalibrations is given by the error bars 
in Fig. 3d. 

We now build on our ability of extracting the phonon number distri- 
bution to perform full Wigner tomography and explore the quantum 
nature of the prepared mechanical state. As in previous circuit QED 
and trapped-ion experiments®!, we use the definition” 


Pla) = tr{D( — a)pD(a)P] =~ wa) (2) 


Here P(a) and W(q) are the values of the displaced parity and Wigner 
functions, respectively, at a phase-space amplitude a, p is the prepared 
state and P is the parity operator. From now on, we plot the values of 
P(q) for clarity, but use the terms ‘displaced parity’ and “Wigner func- 
tion interchangeably. The resonator displacement D(a) is implemented 
by a microwave pulse at the phonon frequency while the qubit is 
detuned at v. Under these conditions, the phonon mode is still coupled 
to the microwave drive port, in part owing to its hybridization with the 
qubit. To verify this and calibrate our displacement amplitudes, we first 
apply a Gaussian phonon drive pulse of varying amplitude a with a 
root-mean-square width of 1 j1s and truncated to a total length of 4 us. 
We then measure the subsequent Fock state populations p,,j9)(a) and 
check that they agree well with the expected Poisson distributions up 
to an overall scaling between the amplitudes of the applied drive and 
the actual displacement (see Supplementary Information). We can then 
calculate the displaced parity for the vacuum state |0) using 
Po) (a)=>5,,(- 1)"p, ig (q). Similarly, we can measure the displaced 
parity P,(a) for an arbitrary state p by adding a phonon drive pulse 
between state preparation and measurement. 

In Fig. 4, we present the results of Wigner state tomography on the 
nominally prepared states |1), (|0) + |1))//2 and |2). The 
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function (black line), data (brown points) and reconstructed Wigner 
function (green line) along the Im(a) = 0 axis. Negative values of the 
Wigner function are indicators of a non-classical state of motion. The 
error bars are extracted in the same way as in Fig. 3d (see main text). 


({0) + |1)) /-/2 state was prepared by applying a 7/2 pulse to the qubit 
and followed by a swap operation with the phonon mode. From the 
measured data shown in Fig. 4a—c, we can reconstruct the measured 
state using a maximum likelihood method!® (see Supplementary 
Information). The Wigner functions of the reconstructed states are 
presented in Fig. 4d-f. Figure 4g-i shows that the reconstructed parities 
agree well with the raw data. The negativity of the Wigner functions 
clearly demonstrates the quantum nature of the states. From the recon- 
structed density matrices, we find that the fidelities of the prepared 
states to the target states are F\;) =0.87 £0.01, Fug) +|1))2 = 9-94 £0.01 
and F\2) =0.78 + 0.02. The infidelity for all three states is dominated by 
excess population in the lower-number Fock states, which is an 
expected consequence of energy decay during state preparation and 
measurement (see Supplementary Information). 

These results show that the quantum state of motion in a macroscale 
mechanical resonator can be prepared, controlled and fully character- 
ized in a circuit QAD device. The demonstration of even more com- 
plex quantum states should be possible with further improvements 
of the device performance. Currently, the dominant source of loss is 
the qubit, and we find that its T, is higher when the resonator chip is 
either not present or rotated by 180° relative to the qubit chip. This 
indicates that the qubit lifetime may be limited by loss due to the AIN, 
which could be mitigated by using a different piezoelectric material or 
optimizing the device geometry to minimize the electric field in the 
AIN that does not contribute to transduction. The current limitations 
on the phonon coherence also require further investigation. The energy 
loss is probably dominated by surface roughness or imperfections in the 
fabricated geometry, whereas additional dephasing could result from 
thermal excitations and frequency fluctuations of the detuned qubit”. 
In addition, we can characterize the final flip-chip geometry, such as the 
spacing and alignment between the chips, more carefully. The assembly 
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process can then be modified accordingly, potentially leading to further 
improvements in the coupling and mode selectivity. 

The next generation of devices could give us access to even more 
sophisticated methods for quantum control of the acoustic resona- 
tor. Our hBAR device can almost reach the strong dispersive regime 
in which circuit QED systems currently operate, which would allow 
quantum non-demolition measurements of phonon numbers” and 
the application of more sophisticated techniques for generating arbi- 
trary quantum states of harmonic resonators!””°. Furthermore, our 
qubit-cooling technique already takes advantage of the multimode 
nature of the acoustic resonator. Future experiments could, for example, 
demonstrate qubit-mediated interactions between multiple modes and 
the creation of multipartite entangled states of mechanical motion’>””. 
Recent efforts in improving the efficiency of electromechanical and 
optomechanical transduction with mechanical resonators could enable 
conversion of quantum information between the microwave and optical 
domains'*”®. Beyond the use of acoustic resonators as resources for 
quantum information, the creation of increasingly complex quantum 
states in highly coherent mechanical resonators can provide insight into 
the question of whether quantum superpositions of massive objects 
are suppressed owing to mechanisms other than environmental deco- 
herence*”’, In addition, the ability to apply quantum control on our 
large-effective-mass, high-frequency and low-thermal-occupation 
mechanical system may put new bounds on modifications to quan- 
tum mechanics at small length scales”*’. These examples suggest that 
the wide range of quantum acoustics demonstrations that may soon be 
possible with hBAR will give rise to new quantum technologies while 
furthering our understanding of fundamental physics. 


Data availability 
The data that support the findings of this study are available from the correspond- 
ing authors upon reasonable request. 
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Superfluorescence from lead halide perovskite 


quantum dot superlattices 


Gabriele Raino!?*>*, Michael A. Becker**°, Maryna I. Bodnarchuk’, Rainer F. Mahrt?, Maksym V. Kovalenko!?* & Thilo Stoferle** 


An ensemble of emitters can behave very differently from its 
individual constituents when they interact coherently via a common 
light field. After excitation of such an ensemble, collective coupling 
can give rise to a many-body quantum phenomenon that results in 
short, intense bursts of light—so-called superfluorescence!. Because 
this phenomenon requires a fine balance of interactions between 
the emitters and their decoupling from the environment, together 
with close identity of the individual emitters, superfluorescence 
has thus far been observed only in a limited number of systems, 
such as certain atomic and molecular gases and a few solid-state 
systems~~’. The generation of superfluorescent light in colloidal 
nanocrystals (which are bright photonic sources practically suited 
for optoelectronics**) has been precluded by inhomogeneous 
emission broadening, low oscillator strength, and fast exciton 
dephasing. Here we show that caesium lead halide (CsPbX3, X = Cl, 
Br) perovskite nanocrystals!* 3 that are self-organized into highly 
ordered three-dimensional superlattices exhibit key signatures of 
superfluorescence. These are dynamically red-shifted emission 
with more than 20-fold accelerated radiative decay, extension of the 
first-order coherence time by more than a factor of four, photon 
bunching, and delayed emission pulses with Burnham-Chiao 
ringing behaviour" at high excitation density. These mesoscopically 
extended coherent states could be used to boost the performance 
of opto-electronic devices!> and enable entangled multi-photon 
quantum light sources!©!”, 

Spontaneous emission of photons—such as happens in the process 
of fluorescence that is commonly used in displays and lighting—occurs 
because of coupling between excited two-level systems (TLS) and the 
vacuum modes of the electromagnetic field, effectively stimulated by 
its zero-point fluctuations. In 1954, Dicke predicted!® that an ensemble 
of N identical TLS confined in a volume smaller than about \? (where 
A is the corresponding emission wavelength of the TLS) can exhibit 
coherent and cooperative spontaneous emission. This so-called super- 
radiant emission results from the coherent coupling between individual 
TLS through the common vacuum modes, effectively leading to a 
single giant emitting dipole from all participating TLS. Superradiant 
emission has been observed in distinctly different physical systems, 
such as molecular aggregates and crystals’, nitrogen vacancy centres 
in diamond” and epitaxially grown quantum dots”! (QDs). In the case 
when the excited TLS are initially fully uncorrelated, the coherence 
can be established only through spontaneously triggered correlations 
due to quantum fluctuations rather than by coherent excitation. When 
this occurs, a so-called superfluorescence (SF) pulse is emitted! (Fig. 1, 
illustrated for the present study). Both superradiant emission and 
coherent SF bursts are characterized by an accelerated radiative decay 
time Tsp & Tsp/N, where the exponential decay time Ts of spontaneous 
emission from the uncoupled TLS is shortened by the number of coupled 
emitters N. In addition, SF exhibits the following fundamental 
signatures, the magnitudes of which are also dependent on the excitation 
density: (i) a delay or build-up time Tp « log(N)/N during which 
the emitters couple and phase-synchronize to each other, and which 


corresponds to the time delay between the excitation and onset of the 
cooperative emission (Fig. 1); and (ii) coherent Rabi-type oscillations 
in the time domain due to the strong light-matter interaction, known 
as Burnham-Chiao ringing'*”*, 

Superfluorescence was first observed in a dense gas of hydrogen 
fluoride’, and then in a limited number of solid-state systems, such as 
CuCl nanocrystals (NCs) formed in a NaCl matrix’, KCI crystals doped 
with peroxide anions? (O,~), and some select semiconductor crystals 
(ZnTe and InGaAs/GaAs multi-quantum wells)*®. Practical implemen- 
tation of such an enhanced radiative property is a persistent challenge. 
Besides stringent requirements for the emissive material (for example, 
high oscillator strength, small inhomogeneous line-broadening, small 
exciton dephasing), equally important are structural, optical and device 
engineerability. Colloidal semiconductor NCs, also known as colloidal 
QDs, could fill this gap as they are structurally and optically versatile, 
and highly suited for the entire visible spectral range. Although they 
are actively pursued for photonic applications®*”3, they have not been 
reported to exhibit SF. 

Here we use colloidal NCs of caesium lead halide perovskites 
(CsPbX3, X= Cl, Br) that can be synthesized with narrow size 
dispersion and are known to exhibit moderate quantum confinement 
effects, resulting in narrow-band emission combined with exceptionally 
large oscillator strength from a bright triplet state!!!“ In order to foster 
cooperative behaviour, we employ structurally well-defined, long-range- 
ordered, and densely packed arrays of such NCs, known as superlattices, 
produced by means of solvent-drying-induced spontaneous 
assembly”>*®, Similarly, regular arrays of II-VI semiconductor NCs 
have been used to obtain collective effects in the electronic domain, 
that is, band-like transport’. Figure 2a outlines superlattice formation 
(see also Methods), using a solution of highly monodispersed CsPbBr3 
NCs with a mean size of 9.5 nm and size standard deviation of less 
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Uncorrelated dipoles Correlated dipoles 
Fig. 1 | Schematic of the build-up process of SF. An initially uncorrelated 
ensemble of TLS (randomly oriented green arrows) is excited by a light 
pulse (blue arrow, top left). After time Tp their phases are synchronized 
(aligned green arrows) such that they cooperatively emit a SF light pulse 
(red arrow at right) with a characteristic decay time Tsp. Grey cubes 
represent long-range-ordered self-assembled superlattices. 
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Fig. 2 | Formation of CsPbX; (X = Cl, Br) NC superlattices by drying- 
mediated self-assembly. a, Illustration of the assembly process: 

a controlled evaporation (indicated by the pale blue arrows) of the solvent 
leads to the formation of micrometre-size cuboidal superlattices. b, High- 
resolution image (obtained using high-angle annular dark-field scanning 
transmission electron microscopy, HAADF-STEM) ofa single CsPbBr3 
NC. ¢, Optical microscope image and d, photograph (under ultraviolet 


than 5% (Extended Data Fig. 1). In the self-assembly process, cuboidal 
individual superlattice domains are formed (that is, supercrystals), 
each consisting of up to several million NCs. Optical microscopy 
(Fig. 2c) reveals superlattices with a lateral size of up to 5 jum, randomly 
distributed in a uniform film on a5 mm x 7 mm sample (Fig. 2d). 
Transmission electron microscopy confirms that highly ordered super- 
lattices consist of well-separated individual NCs (Fig. 2e and Extended 
Data Fig. 2). More details of the self-assembly process are reported in 
the Methods section. 

Figure 3a displays the photoluminescence (PL) spectrum ofa single 
CsPbBr;3 superlattice (excited at 3.06 eV) exhibiting two emission 
peaks. This and all other optical measurements were performed at a 
temperature of 6 K in vacuum or in a helium atmosphere (see Methods 
for details). The high-energy emission peak coincides with the 
centre energy of PL from a disordered dense film of CsPbBr3; NCs 
(in a glassy state) and is therefore assigned to uncoupled QDs. In 
addition, a narrow, red-shifted emission peak appears in PL from 
superlattices, which we assign to the emission of coupled QDs, 
which is best fitted with a Lorentzian (full-width at half-maximum, 
FWHM.oupled = 11 meV). The peaks from the uncoupled QDs in a 
superlattice and in the glassy films are best fitted with a Gaussian, as 
expected for disordered ensembles. The width of the uncoupled QD 
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Fig. 3 | Optical properties of CsPbBr3 QD superlattices. a, PL spectrum 
of a single CsPbBr; superlattice (black solid line). The high-energy band 
is assigned to the emission of uncoupled QDs. The low-energy band is the 
result of the emission of coupled QDs and is not present in glassy films 
of NCs (green solid line). The shaded areas are fits to the data (see main 
text). b, Time-resolved PL decay of the two emission bands at 500 nJ cm 
excitation fluence after applying suitable spectral filters to separate the 
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light) of a layer of micrometre-sized, three-dimensional, cuboidal-shaped 
NC superlattices. e, HAADF-STEM image of a single superlattice of 
CsPbBr3 NCs. The cubic shape of the individual perovskite NC building 
blocks is translated into the symmetry of the superlattice (simple cubic 
packing). Inset, magnified view of the boxed area, showing the individual 
NCs. 


emission (FWHMancoupled =55 meV) is slightly broader than that of the 
amorphous film (FWHMamorphous = 35 meV), which can be explained 
by assuming that more ‘identical QDs’ within the superlattice are now 
forming the peak from coupled QDs while the remaining uncoupled 
ones appear more disordered than the inhomogeneous energy distri- 
bution of the primary QD material. We can exclude the possibility that 
the red-shifted feature, which is about 70 meV lower in energy than 
the uncoupled QD emission, originates from emission from trions, 
bi-excitons or multi-excitons because the energy shifts of these last 
three species are reportedly!’ 10—20 meV, and these would also be 
observable in the disordered ensemble. The number and interaction 
strength of coupled QDs determine the magnitude of the energy shift. 
Statistics from 10 superlattices from different samples give an average 
static red-shift of (64 +6) meV, average FWHM coupled = (15 +4) meV 
and average FWHMauncoupled = (49 + 21) meV. In most superlattices, 
we observe a substructure in this red-shifted emission band, which 
we attribute to the presence of several slightly different independent 
domains within the same individual superlattice. 

A central feature of cooperative emission is the modification of 
the radiative lifetime!*’, as demonstrated experimentally in several 
quantum emitters®*°”!. In time-resolved PL decay measurements at a 
very low excitation fluence (5 nJ cm”), we do not observe a significant 
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two components (blue curve, from uncoupled QDs; dark-red curve, from 
coupled QDs). The 1/e-decay time of the two bands (Tcoupled ANd Tuncoupled> 
respectively) are also indicated. With increasing excitation fluence, the 
decay from the coherently coupled QDs is substantially faster than from 
the uncoupled ones. Inset, power-dependence of the 1/e-decay times of 
both components. 
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modification of the decay of the coupled QD emission compared to 
that of the uncoupled QD emission (Fig. 3b inset). The absence of 
accelerated emission at vanishing excitation fluence and the presence 
of the red-shifted feature in PL excitation scans (Extended Data Fig. 3) 
corroborate that the static red-shift of ~70 meV originates from inco- 
herent coupling of the QDs in the ground state, similar to that found in 
various molecular aggregates’’. At a slightly higher excitation fluence 
(500 nJ cm~? per pulse), we already observe an accelerated PL decay 
of the coupled QD emission peak in comparison to the PL decay of 
uncoupled QDs, with 1/e decay times of Tsp = 148 ps and Tan = 400 ps, 
respectively (Fig. 3b). In contrast to the predominantly mono-expo- 
nential decay of the uncoupled QDs, this SF emission decay is well 
approximated by a stretched exponential”? (see Methods section), 
because the number of excited coupled emitters, and therefore the 
emission acceleration, varies during the decay. Furthermore, in con- 
trast to the uncoupled QDs, the SF decay time is strongly dependent 
on excitation power (Fig. 3b inset): this is because it scales with the 
coupling strength among the QDs, which is given by the intensity 
of the common light-field that effectively corresponds to a change 
in the number of coherently coupled QDs. When the spectrally and 
temporally integrated emission is fitted with a power law, we obtain 
an exponent of 1 (Extended Data Fig. 4), indicating that excitation- 
density-dependent non-radiative decay channels (for example, Auger 
recombination) are absent. Notably, no threshold behaviour as occurs 
for amplified spontaneous emission is observed. 

The cooperative emission process strongly influences the coherence 
of the emitted light. First-order correlation measurements of each of 
the two emission peaks by means of a Michelson interferometer allow 
us to monitor the interference pattern and therefore the phase coher- 
ence time (Fig. 4a). The emission band of the uncoupled QDs exhibits 
a coherence time of 38 fs, best fitted with a Gaussian decay (Fig. 4a, 
upper graph), typical of incoherent (thermal) light sources. The emis- 
sion from the coherently coupled QDs (Fig. 4a, lower graph) exhibits 
a much longer coherence time with an exponential decay of 140 fs. 
For some superlattices, a Gaussian decay is observed (Extended Data 
Fig. 5a), which might be attributed to number fluctuations within the 
coherent SF state*’. 

Second-order coherence of the emitted light is evinced by the statis- 
tics of the photon arrival time on a detector*!. Typical coherent light, 
as from a laser, shows a random (Poissonian) distribution of photon 
arrival times, whereas a single TLS exhibits photon antibunching 
(a sub-Poissonian distribution). In contrast, the cooperative emission 
from coupled QDs leads to coherent multi-photon emission bursts. 
ree 4b reports the second-order correlation function, 
g? (7) = U(fI(t + 7))/1(t))’, for both PL emission bands, where I(t) 
is the signal intensity at time t and (J({)) is its statistical average. For the 
uncoupled QD emission (Fig. 4b, upper graph), the plot is flat 
(g(r) = 1) because the experimental temporal resolution (40 ps) is 
insufficient to resolve the expected thermal bunching. The SF emission 
band, however, shows pronounced photon bunching (Fig. 4b, lower 
graph) because the coherent coupling leads to the correlated emission 
of multiple photons within a short time interval. Photon bunching 
is only observable in superlattices with one or a few SF domains 
(that is, where no substructure is visible in the red-shifted emission 
band) because spectrally overlapping uncorrelated aggregated domains 
within the same superlattice reduce the bunching peak’s visibility, as 
predicted by theory**. Yet it is a robust effect that is observed with 
pulsed excitation and also for mixed-halide (CsPbBr2Cl, emitting at 
higher energies) QD superlattices (see Extended Data Figs. 5b and 6b, 
respectively). Remarkably, some superlattices with supposedly well- 
isolated coherently coupled QDs exhibit g?(0) >2 (Fig. 4 inset), 
similarly to superthermal emission*. The exponential decay time of 
the second-order correlation is of the order of the radiative decay time 
of the SF emission for low excitation densities (7 g = 224 ps). 

SF emission exhibits very distinct characteristics in the time domain 
under strong driving conditions. Figure 5a shows a streak camera 
image acquired at an excitation fluence of 600 J cm’, where a finite 
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Fig. 4 | First- and second-order coherence properties of CsPbBr3 

QD superlattices. a, First-order correlation of the two emission bands 

as obtained from the interference fringe visibility using a Michelson 
interferometer. The high-energy band of the uncoupled QDs has a very 
short phase coherence time (<40 fs, upper graph, blue symbols), whereas 
the red-shifted band from the coupled QDs is characterized by much 
longer phase coherence (140 fs, lower graph, dark-red symbols). The 
solid lines are fits to the data (see main text). a.u., arbitrary units. Inset, 
an example of the real-space interferogram. b, Second-order correlation 
function, g(r), obtained with a Hanbury—Brown and Twiss set-up 

in start-stop configuration. For the high-energy band (upper graph, 

blue symbols), a flat profile with g(r) = 1 is observed. The red-shifted 
emission band (lower graph, dark-red symbols) from the SF emission 
displays a pronounced bunching peak, characteristic of the correlated 
emission during a photon burst. The data are fitted to the function 
g(r) =1 + Aexp(—|r |/7;) (solid lines), where A is an amplitude 
prefactor, and 7, the characteristic decay time of the second order 
coherence. Inset, an example of superbunching with g(0) > 2 froma 
single superlattice. Black open circles are experimental data while the red 
curve represents the best fit to the function described above. 


rise time and subsequent oscillation of the emission are observed in 
addition to a much shortened radiative decay. Quantitative analysis 
of spectrally integrated PL decay traces for various excitation power 
densities is shown in Fig. 5b (for details see Methods). As excitation 
fluence is increased, the decay time shortens to 14 ps (Fig. 5c, 
upper panel). From this shortening, which is an order of magnitude 
stronger than that reported for the collective emission from other QD 
systems”, we can estimate the average number of coherently coupled 
QDs to be N = 28. This is only an effective value and is a conservative 
estimate, because the energetic disorder of the QD emission energies 
(FWHM coupled = 11 meV) still substantially exceeds the emission peak 
width of individual QDs (typically'! FWHM ~ 1 meV) and thereby 
effectively reduces the coherent coupling*’. The SF emission undergoes 
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Fig. 5 | Burnham-Chiao ringing behaviour of CsPbBr3 QD 
superlattices. a, Streak camera image of SF dynamics obtained with a high 
excitation density of 600 \1J cm~. See Methods for details. b, Extracted 
time-resolved emission intensity traces for five different excitation powers 
(see key). Solid red lines are weighted best-fits to a model that employs a 
bi-exponential decay function with damped oscillations. c, Top, effective 
SF decay (rsp, blue circles) as a function of the excitation power density 
fitted according to the SF model (solid blue line). Middle, dark-red circles 


a dynamical red-shift of up to 15 meV owing to renormalization of 
the emission energy from the coherent coupling’, which decreases 
in the course of the decay as the number of excited dipoles reduces 
(Extended Data Fig. 7). The peak intensity increases superlinearly 
over three orders of magnitude (Fig. 5c, middle panel) according to 
a power-law dependence with an exponent of a=1.5+0.1, deviating 
from the theoretically expected’ value of « =2, presumably owing 
to saturation effects°. Nevertheless, no substantial quenching of 
the emission for high excitation fluences was found, verifying that 
the decay remains essentially radiative (Extended Data Fig. 4d). 
Furthermore, a shortening of the SF build-up time (7p), after which 
the photon burst is emitted, is observed (Fig. 5c, bottom panel). This 
characteristic of SF is a consequence of the time it takes for the indi- 
vidual dipoles to become phase-locked and scales with the number 
N of excited coupled QDs according to Tp x log(N)/N (see Methods 
section). 

As SF crucially depends on low decoherence and low inhomogeneous 
energy variation, it should be noted that SF coupling is strongly affected 
by the environment around the QDs (for example, the number of free 
ligands, which are the molecules used to passivate the NC surface), by 
the superlattice assembly, and by the quality of the QDs themselves. 
Thus, while a large fraction of the superlattices display the red- 
shifted emission from coupled QDs, the amount of photon-bunching 
and Burnham-Chiao ringing varied from superlattice to superlattice. 
However, experiments employing different batches of NCs and super- 
lattice assemblies of CsPbBr3 and CsPbBr2Cl NCs (see Extended Data 
Figs. 6-9) were consistently reproducible, but further optimization of 
the synthesis and assembly is likely to improve the yield of SF domains. 
It is important to note that experiments on control samples with diluted, 
uncoupled QDs under similar excitation conditions do not show any of 
the above-mentioned signatures of SF (Extended Data Fig. 10), proving 
that the observed peculiar emission characteristics of QD superlattices 
arise from a genuine multi-particle effect. 

Our measurements reveal that coherent SF coupling can be achieved 
in long-range-ordered self-assembled superlattices of fully inorganic 
CsPbX; perovskite NCs, resulting in strong emission bursts. Colloidal 
NCs and their assemblies have proven to be excellent building blocks 
for a large variety of opto-electronic devices, and these cooperative 
effects now allow modification of the opto-electronic properties beyond 
what is possible at the individual QD level with chemical engineering 
approaches. This opens up new opportunities for high-brightness and 
multi-photon quantum light sources, and could enable the exploitation 
of cooperative effects for long-range quantum transport and ultra- 
narrow tunable lasers. 
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represent the peak SF emission intensity that increases superlinearly 

with excitation power, corresponding to a power-law dependence with 

an exponent a = 1.5 +0.1 (solid dark-red line). Bottom, the extracted 
delay time Tp (green circles) decreases at high excitation power due to the 
increased interaction among the emitters. The green solid line is the best 
fit according the model described in the Methods section. The error bars 
represent the parameters’ fit uncertainty. 
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METHODS 

Synthesis of CsPbBr; nanocrystals. In a 25 ml three-necked flask, PbBr2 (69 mg, 
0.188 mmol, Aldrich, 99%) was suspended in octadecene (5 ml), dried at 100°C for 
30 min, and mixed with oleic acid (0.5 ml, vacuum-dried at 100°C) and oleylamine 
(0.5 ml vacuum-dried at 100°C). When the PbBr, had dissolved, the reaction 
mixture was heated up to 180°C and preheated caesium oleate in octadecene (0.4 ml, 
0.125 M) was injected. The reaction mixture was cooled immediately with an ice 
bath to room temperature. 

Synthesis of CsPbBr2Cl nanocrystals. In a 25 ml three-necked flask, PbBr2 (45 mg, 
0.12 mmol, Aldrich, 99%), PbCl, (18 mg, 0.064 mmol, ABCR) and 1 ml trioctylphos- 
phine (Strem, 97%) was suspended in octadecene (5 ml), dried at 100°C for 30 min, 
and mixed with oleic acid (0.5 ml, vacuum-dried at 100°C) and oleylamine (0.5 ml 
vacuum-dried at 100°C). When the PbCl, and PbBr, had dissolved, the reaction 
mixture was heated up to 180°C and preheated caesium oleate in octadecene 
(0.4 ml, 0.125 M) was injected. The reaction mixture was cooled immediately with 
an ice bath to room temperature. 

Purification and size-selection of CsPbX3 (X= Cl, Br) nanocrystals. A critical 
factor for self-assembly of cubic-shaped CsPbX3 NCs is to start with an initially 
high level of monodispersity. The crude solution was centrifuged at 12,100 revo- 
lutions per minute for 5 min, following which the supernatant was discarded, and 
the precipitate was dissolved in 300 i1l hexane. The hexane solution was centrifuged 
again and the precipitate was discarded. The supernatant was diluted twice and 
used for further purification. Subsequently, two methods of purification of the 
NCs were applied: (a) 50 j1l hexane, 0.6 1] oleic acid and 0.6 11 oleylamine were 
added to 50 jl NCs in hexane. The colloid was destabilized by adding 50 11 acetone, 
followed by centrifuging and dispersing the NCs in 300 1 toluene. This solution 
was used further for the preparation of the 3D superlattices. (b) 50 jl hexane and 
100 il toluene were added to 50 j1l NCs in hexane. The colloid was destabilized by 
adding 50 11 acetonitrile, followed by centrifuging and dispersing the NCs in 300 il 
toluene. This solution was used further for the preparation of the 3D superlattices. 
Preparation of 3D superlattices. CsPbX3; NC superlattices were prepared on glass 
or on 5mm x 7 mm silicon substrates. Shortly before the self-assembly process, the 
silicon substrate was dipped into 4% solution of HF in water for 1 min, followed 
by washing with water. In a typical assembly process, the substrate was placed 
ina 10mm x 10mm x 10 mm Teflon well and 10 j1l of purified NCs in toluene 
were spread onto the substrate. The well was covered with a glass slide and the 
toluene was then allowed to evaporate slowly. 3D superlattices of CsPbBr3; NCs 
were formed upon complete evaporation of the toluene. Typical lateral dimen- 
sions of individual superlattices ranged from 1 |1m to 10 jum, wherein some of 
them are arranged into clusters of several superlattices and others remain spatially 
well-isolated so that PL measurements can be performed on an individual superla- 
ttice. Greater purification or greater polydispersity of NCs led to disordered or 2D 
assemblies (glassy films). Furthermore, the formation of NC superlattices can serve 
to further narrow the size distribution and shape uniformity within the ensemble 
(with smaller or larger NCs being repelled from the NC domain), especially in the 
case of simple cubic packing of cubes, which is particularly intolerant of size and 
shape variations. 

Optical spectroscopy. All measurements were performed at cryogenic temper- 
atures (6 K). For PL, time-resolved PL, and second-order photon-correlation 
measurements on single superlattices, the sample was mounted in an evacu- 
ated liquid-helium flow cryostat on an xyz positioning stage and excited with a 
fibre-coupled excitation laser at an energy of 3.06 eV, either in continuous wave 
mode or pulsed mode with a 40 MHz repetition rate (pulse duration 50 ps). The 
excitation laser output was filtered with a short-pass filter and directed towards the 
long-working-distance 100 x microscope objective (numerical aperture NA = 0.7) 
by a dichroic beam splitter, resulting in a nearly Gaussian-shaped excitation spot 
with a 1/e” radius of 1.4 um. The emission was collected via the same microscope 
objective and filtered using a tunable bandpass filter. For PL measurements, the 
collected light was then dispersed by a 300 lines per mm grating inside a 750 mm 
monochromator and detected by an EMCCD camera. For measurements of the 
PL decay, we filtered the emission with a tunable band-pass (FWHM = 15 nm) and 
recorded the decay with an avalanche photodiode single-photon detector with a 
time resolution of 30 ps connected to a time-correlated single-photon-counting 
system. The photon correlation was recorded using a similar set-up with two 
detectors in a Hanbury-Brown and Twiss configuration. 

To record streak camera images and first-order coherence measurements, we 
excited the sample, which was mounted in an exchange-gas cryostat at 6 K, with a 
frequency-doubled regenerative amplifier seeded with a mode-locked Ti:sapphire 
laser with a pulse duration of 100-200 fs and a repetition rate of 1 kHz at 3.1 eV. 
For both excitation and detection, we used an 80 mm lens (NA =0.013 after iris), 
resulting in an oval excitation spot area of 20 ym x 40 jum. The recorded PL was 
dispersed by a grating with 150 lines per mm in a 300 mm spectrograph and detected 
with a streak camera with a nominal time resolution of 2 ps and an instrument 
response function FWHM of 4 ps (see Extended Data Fig. 10). First-order 


coherence measurements were performed using a Michelson interferometer. Here 
a non-polarizing beam splitter is used to split and recombine the light in the two 
interferometer arms, with one arm including a retroreflector on a delay stage with 
100 nm step resolution. A tunable band-pass filter is applied to select the emission 
from either the coupled or the uncoupled QDs. The interferogram was recorded 
as real-space images of the recombined and focused detection beams on a camera. 
Optical properties of superfluorescence, superradiance and subradiance. As 
shown in Fig. 3b, we observed that the PL decay of the SF state is initially very 
fast and cannot be described with a single exponential because the decay rate [is 
dependent on the number of excited TLS, [(N) « N, and therefore decreases during 
the decay. Consequently, the SF decay rate should converge towards the decay rate 
of the uncoupled nanocrystals. However, we observe that the SF decay trace crosses 
the bi-exponential PL decay of the uncoupled QDs after 97% of the photons are 
emitted due to long decay components. These long decay components might origi- 
nate from coupled QDs where the individual dipoles are out of phase and interfere 
destructively, a phenomenon known as subradiance”*™*, In ensembles with inho- 
mogeneously broadened PL, SF and subradiant states can coexist, and we find good 
agreement of the predicted excited state population with the measured PL decay”. 

An out-of-phase coupling amongst the QDs is expected to result in a higher 

photon energy of the subradiant state compared to the SF state. In Extended Data 
Figs. 7 and 9, we provide an analysis of the dynamical energy shift observed at high 
excitation power density for CsPbBr3 and CsPbBr2Cl QD superlattices, respec- 
tively. Examples of emission spectra at different times are reported in Extended 
Data Figs. 7a and 9a for the respective QD halide compositions. In Extended Data 
Figs. 7b and 9b, we plot the fitted centre photon energy of time-sliced PL spectra 
(2 ps bin) as a function of the fitted peak area (that is, the time-dependent emission 
intensity), as obtained from excitation-power-dependent streak camera images, 
again for both CsPbBr3 and CsPbBr2Cl QD superlattices. This effectively shows 
the energetic shift of the SF state as a function of its occupation, with the different 
curves representing different initial excitation powers. The green arrows indicate 
the time sequence of the individual analysed spectral traces. By increasing the 
excitation power, we observe that the initial dynamical red-shift is largest for the 
highest excitation power, as is expected from its relationship to the number of 
excited coupled QDs. Hence, when the number of excited coupled QDs decreases 
during the decay process, the emission energy blue-shifts to higher energy, as can 
be seen in Extended Data Figs. 7c and 9c where the fitted centre photon energy 
is plotted as a function of time. We observe the most pronounced energetic blue- 
shift for the highest excitation power, resulting in a final emission with a photon 
energy that has been boosted incrementally more in comparison to the blue-shift 
for low excitation power, which is another indication of the presence of subradiant 
states that emit at higher energies. For high excitation power, the SF state becomes 
depopulated much faster since more QDs are coupled simultaneously. Then, at 
long timescales after the initial decay, the percentage of subradiant states becomes 
dominant, resulting in a blue-shift of the PL emission. 
Superfluorescence fit model. SF decay traces as in Fig. 3b cannot be fitted well 
with mono- or bi-exponential functions because the decay rate is proportional 
to the number of excited coupled QDs, I(N) « N, which also decays over time. 
Furthermore, the resulting characteristic decay follows neither a stretched- 
exponential nor a power-law dependence” exactly, whereas the PL decay of 
the uncoupled QDs is well described by a bi-exponential behaviour, where the 
initial fast decay Tap = 349.8 + 0.4 ps accounts for over 96% the total emitted 
photons. Nevertheless, we found that the best approximate fit to the SF decay 
trace as a function of time f is the Kohlrausch stretched-exponential decay model 
S()/f(0) = exp[-, ‘trechedt)”], where ‘{(0) is the initial intensity at t=0, I treched is 
the average decay time and (3 € [0, 1] is the stretch parameter, which represents 
the distribution of decay rates*”. Using this model to fit the SF decay curve, we 
obtain an average decay time Tstretched = 40.4 + 0.5 ps and a stretch parameter 
3=0.457 + 0.002. 

At a high excitation density, as shown in Fig. 5b for CsPbBr3 QD superlattices 
and in Extended Data Fig. 8b for CsPbBr2Cl QD superlattices, we observe oscilla- 
tions in the decay. To model the SF decay with this characteristic ringing behaviour, 
we used a decay model consisting of a bi-exponential decay that is multiplied by 
a damped oscillating term 1+ Bexp(—~pampt)cos(wt + @o), where B is the initial 
amplitude of the oscillation, w is its angular frequency, ¢o is its initial phase and 
Damp is its damping constant. Furthermore, for the rising edge of the emitted pulse, 
which is emitted after build-up time Tp, we take into account a Gaussian rise term 
exp{—[(t — Tp)/Trise]”}, such that the complete fit function is given by**: 
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Here, A, are the amplitudes of the exponential decay with the corresponding decay 
time constants, T,. Both the fast decay time and the long decay time component 
(Extended Data Fig. 7d for CsPbBr3 QD superlattices and Extended Data Fig. 9d 
for CsPbBr2Cl QD superlattices) decrease upon increasing the excitation density, 
whereas the rise time, Trise= 2.40.3 ps for CsPbBr3 QD superlattices 
(Trise= 3.4 £ 1.0 ps for CsPbBr2Cl QD superlattices), stays approximately constant 
(probably clamped by the time resolution of the set-up). In the upper panel of 
Fig. 5c, we plot the power-dependent effective decay time Tsp = (A171 + A272)/ 
(A; + Az) for CsPbBr3 QD superlattices, where 7), 72 are the decay times of the 
bi-exponential fit and Aj, Az the corresponding amplitudes. Assuming that the 
initial number of coherently coupled QDs is proportional to excitation fluence Prxc 
with the proportionality constant ¢, the power-dependent effective decay time was 
fitted with Tsp(Pgxc) = Tap/(GPexc + 1) + yo. Here, a fixed value of Tan = 400 ps was 
used, obtained from the time-resolved PL measurements of uncoupled QDs and 
an additional offset yo was inserted to account for effects like the finite time reso- 
lution. We obtain good agreement with the expected behaviour (Tsp « Tap/N for 
a value Ccspppr3 = 0.29 + 0.04 cm? ,J~). In the lower panel of Fig. 5c, we plot 7p for 
the CsPbBr3 QD superlattices as a function of the excitation power. In our analysis, 
the build-up time is composed of the actual delay time due to the SF build-up and 
a systematic, constant time-offset because the absolute arrival time of the excitation 
pulse (which has a different wavelength from the emission) at the sample cannot 
be measured reliably at the required precision from the streak camera data. We 
observe a decrease in Tp of about 6 ps when increasing the excitation density by 
almost 2 orders of magnitude. We have fitted this behaviour for the build-up time 
Tp With Tp = Yottset + Aln(CPExc + 1)/(CPrxc + 1) because we assume that 
Tp « In(N)/N and that the number of excited coupled emitters N x ¢Prxc+ 1 is 
proportional to the excitation fluence P;,- with the proportionality constant ¢. 
Herein, we use a fixed value Ccspppr3 = 0.29 + 0.04 cm? J~!, which we obtained 
from the fit of the effective decay in the upper panel of Fig. 5c, and an amplitude 
prefactor A. The resulting fit agrees very well with the data. To obtain the absolute 
time delay, we subtracted the constant offset yorfset of the time-delay fit from the 
time-delay data points. SF occurs when ,/7g:7 < T3, where T; is the exciton pure 
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dephasing time, whereas , /T,sp7p > T> signifies the amplified spontaneous emis- 
sion (ASE) regime, when Tagg is the decay time*’. Considering that the coherence 
time T, < T} extracted from the FWHM of single QDs"" is of the order of 
T)=6.6 ps, our measurements reveal a fast decay of about 14 ps and a delay time 
of <1 ps which satisfies the criterion for the appearance of SF. 

A similar analysis was performed on CsPbBr2Cl QD superlattices, as shown in 
Extended Data Fig. 8. In the upper panel of Extended Data Fig. 8c we plot power- 
dependent effective decay time and fit the data with the same model as described 
above, using Tap = 250 ps, and we obtain a value Ccspppraci= 0.08 + 0.01 cm? po}. 
The peak intensity of the decay curves as a function of the excitation density is 
shown in the middle panel of Extended Data Fig. 8c, which increases superlin- 
early with a power-law dependence with an exponent acsppprci = 1.3 + 0.1. 
Also, the build-up time decreases as a function of the excitation density, as 
displayed in the lower panel of Extended Data Fig. 8c, and fitted the data. Again, we 
fitted the data using the same formula for Tp as described above with a fixed value 
CcspbBr2c1= 0.08 + 0.01 cm? J~!, and obtained good agreement. 


Data availability 
The data that support the findings of this study are available from the correspond- 
ing authors upon reasonable request. 
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Extended Data Fig. 1 | Quantitative analysis of CsPbBr3 NC size 
distribution. a, Low-resolution TEM image of the NC material used to 
prepare the superlattices. b, Histogram of NC sizes (of >100 NCs) as 
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obtained from TEM image analysis. The solid line is a fit with a normal 
distribution, and the given mean size (9.45 nm) and standard deviation 
(0.41 nm) are obtained from this fit. 
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2 | HAADF-STEM image of a single superlattice of CsPbBr3 NCs. Individual NCs (bright spots in the image) are well-resolved. 


Extended Data Fig. 
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Extended Data Fig. 3 | PL excitation measurement of CsPbBr3 QD 
superlattices. Using a weak, tunable excitation source, we plot the PL 
intensity (black circles) obtained at 2.30 eV photon energy as a function 

of excitation photon energy. The shaded areas are Gaussian peak fits. This 
PL excitation measurement shows that the coupled QD feature (red) is also 
present in absorption, in addition to the peak from uncoupled QDs (blue) 
and energetically higher states. 
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Extended Data Fig. 4 | Power-dependent PL properties of CsPbBr; by fitted power-law exponents m of approximately 1. c, Colour-coded 
QD superlattices. a, Colour-coded PL emission in the low-power PL emission in the high-power excitation regime, shown for increasing 
excitation regime, shown for increasing excitation fluence (in nJ cm~*) excitation fluence (in J cm~”) of 330 (light green), 1,270 (light blue), 
of 10 (light green), 60 (light blue), 150 (yellow) 310 (dark green) and 600 2,130 (yellow), 3,470 (dark green) and 6,330 (dark blue). d, As b, but for 
(dark blue). b, PL intensity integrated over the spectral emission range the high-power excitation regime. Fits to the data reveal a power-law 
of the uncoupled QDs (‘QD intensity’ blue circles) and coupled QDs (‘SF behaviour with a linear increase for the SF emission, a slightly sublinear 
intensity, dark-red circles) in a log-log plot and the total emitted intensity increase for the uncoupled QDs and a less sublinear increase for the total 
(yellow circles). Fits to the data reveal linear behaviour, as represented emitted intensity. 
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Extended Data Fig. 5 | Gaussian first-order coherence decay and (Kubo) and exponential decay (for some of the superlattices). The red solid 
photon bunching in pulsed excitation of CsPbBr3 QD superlattices. line is a fit with a mixed Gaussian/exponential model function. b, Second- 
a, First-order coherence of the coupled QD emission extracted from the order photon correlation measurement of the coupled QD emission from 
fringe visibility of the interferograms as a function of delay time between a single superlattice showing photon bunching at zero delay under pulsed 
the arms of a Michelson interferometer, revealing a mixture of Gaussian excitation with a 40 MHz repetition rate. 
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pass filtered PL spectra of uncoupled QD emission (blue) and coupled panel, g(r) = 1), showing a flat correlation function, and of coupled QD 
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Extended Data Fig. 7 | SF decay and dynamic red-shift of CsPbBr; 

QD superlattices. a, PL spectra (integrated over a 2 ps time window) at 
different time delays on a semi-log scale. b, The PL spectra are fitted to a 
single Gaussian peak function, and the fitted peak amplitude as a function 
of the emission energy is plotted for various excitation densities (see key). 
Green arrows indicate the time evolution of the emission peak. The black 
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dashed line denotes the mean energy at the lowest excitation density, and 
the grey shaded area is the peak’s FWHM. ¢, Fitted peak centre energy as a 
function of time for various excitation densities (see key). d, Fast and slow 
PL decay time components 7; and 72 of the SF bi-exponential fit model as a 
function of excitation density on a semi-log plot. The error bars represent 
the parameters’ fit uncertainty. 
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Extended Data Fig. 8 | Burnham-Chiao ringing behaviour in represent the peak SF emission intensity that increases superlinearly 
CsPbBr2Cl QD superlattices. a, Streak camera image of SF dynamics with excitation power, corresponding to a power-law dependence with 
obtained with a high excitation density of 1,600 p,J cm~’. b, Extracted an exponent a = 1.3 + 0.1 (solid dark-red line). Bottom, the extracted 
time-resolved emission intensity traces for three different excitation delay time Tp (green circles) decreases at high excitation power due to the 
powers (see key). Solid lines are best-fits to a model that employs a bi- increased interaction among the emitters. The green solid line is the best 
exponential decay function with damped oscillations. c, Top, effective fit according the model described in the Methods section. The error bars 
SF decay time (blue circles) as a function of the excitation fluence fitted represent the parameters’ fit uncertainty. 
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Extended Data Fig. 9 | SF decay and dynamic red-shift of CsPbBr2Cl 
QD superlattices. a, PL spectra (integrated over a 2 ps time window) at 
different time delays on a semi-log scale. b, The PL spectra are fitted to a 
single Gaussian peak function, and the fitted peak amplitude as a function 
of the emission energy is plotted for various excitation densities (see key). 
Green arrows indicate the time evolution of the emission peak. The black 
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dashed line denotes the mean energy at the lowest excitation density, and 
the grey shaded area is the peak’s FWHM. ¢, Fitted peak centre energy as a 
function of time for various excitation densities (see key). d, Fast and slow 
PL decay time components 7; and 72 of the SF bi-exponential fit model as a 
function of excitation density. The error bars represent the parameters’ fit 
uncertainty. 
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proves that the observed SF features cannot be explained by single 
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A structural transition in physical networks 


Nima Dehmamy!, Soodabeh Milanlouei! & Albert-Laszl6 Barabasi!?3* 


In many physical networks, including neurons in the brain), 
three-dimensional integrated circuits? and underground hyphal 
networks’, the nodes and links are physical objects that cannot 
intersect or overlap with each other. To take this into account, 
non-crossing conditions can be imposed to constrain the geometry 
of networks, which consequently affects how they form, evolve 
and function. However, these constraints are not included in the 
theoretical frameworks that are currently used to characterize real 
networks*~’. Most tools for laying out networks are variants of the 
force-directed layout algorithm®°—which assumes dimensionless 
nodes and links—and are therefore unable to reveal the geometry 
of densely packed physical networks. Here we develop a modelling 
framework that accounts for the physical sizes of nodes and links, 
allowing us to explore how non-crossing conditions affect the 
geometry of a network. For small link thicknesses, we observe a 
weakly interacting regime in which link crossings are avoided via 
local link rearrangements, without altering the overall geometry 
of the layout compared to the force-directed layout. Once the link 
thickness exceeds a threshold, a strongly interacting regime emerges 
in which multiple geometric quantities, such as the total link length 
and the link curvature, scale with the link thickness. We show that 
the crossover between the two regimes is driven by the non-crossing 
condition, which allows us to derive the transition point analytically 
and show that networks with large numbers of nodes will ultimately 
exist in the strongly interacting regime. We also find that networks 
in the weakly interacting regime display a solid-like response to 
stress, whereas in the strongly interacting regime they behave in 
a gel-like fashion. Networks in the weakly interacting regime are 
amenable to 3D printing and so can be used to visualize network 
geometry, and the strongly interacting regime provides insights into 
the scaling of the sizes of densely packed mammalian brains. 

To lay out physical networks, the links and nodes must be arranged 
in such a way to avoid crossing each other, while minimizing the total 
length of the links, because long links can be costly in systems such as 
brains. In other words, we must find the shortest path for each link, 
which may not be a straight path if the straight path is obstructed by 
other nodes and links—a problem that is equivalent to stretching a 
rubber band between flexible obstacles (Fig. 1; see Supplementary 
Information section 3.A for a proof of this equivalence’). 

To find the shortest path, we propose a model in which the forces that 
govern the motion of the nodes and links are determined by the gradi- 
ent of the total potential energy. We define the total potential energy as: 
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where V,; is the total elastic potential of all links (I= 1, ..., L). Each link 
is modelled as an elastic cylinder with radius r,, which experiences 
internal elastic forces and short-range external repulsive forces from 


other links and nodes; nodes are modelled as spheres. Vx, captures the 
node-link interactions at the endpoints of the links; the non-crossing 
condition is ensured by a short-range repulsive force in the node-node 
interaction Vyy and in the link-link interaction V,,, which are both 
modelled as short-range Gaussian potentials with strengths set by An 
and Ay, respectively. In addition, s; parameterizes the length of link J, 
with s/*" denoting its endpoint; x(s;, t) is the position of a point along 
the centre of link / at time t; X,(f) is the position of node i (i=1, ..., N); 
ry is the range of the node-node repulsive force; k is the elastic constant 
of the links; and / € (i) indicates that the sum is over all links connected 
to node i. The potential energy in equation (1) is inspired by models 
used in self-avoiding polymer chains'! and manifold dynamics!*; how- 
ever, given the constraints induced by the network structure, equation 
(1) has different terms and describes behaviour that is unique to net- 
works. 

With V;, =0 and replacing V.| with the elastic energy of a spring, 
equation (1) reduces to the potential energy of a force-directed lay- 
out (FDL) with short-range node repulsion. The lowest-energy solu- 
tion of equation (1) can involve sharp bending of some links, which 
we avoid by using a Gay-Berne potential’, as in polymer physics 
(Supplementary Information section 4). Finally, we embed the network 
in a high-viscosity medium, allowing it to relax to a low-energy state 
without oscillations. Therefore, the node and link positions (X; and x7) 
follow the first-order gradient-descent equations of motion: 


dx, _ OV d OV 
L— 


‘. dt Ax, — ds, A(dx;/ds)) (2) 
where Ay and 4, are the friction constants of the nodes and links 
(Supplementary Information section 3.F). We use an FDL to set the 
initial positions of the nodes and explore two versions of the model 
with different constraints: (i) in the elastic-link model (ELI), which 
corresponds to the limit Ay — oo, the positions of the nodes are 
fixed and only the links can reorganize; (ii) in the fully elastic model 
(FUEL), we assume that Ay % A; and hence the nodes and links are 
all free to move. 

The network defined by equations (1) and (2) has an uneven potential- 
energy landscape“ with a very large number of local minima; identi- 
fying the globally optimal configuration is NP hard (Supplementary 
Information section 3.G). We therefore use simulated annealing!> to 
approach an energetically favourable local minimum (Supplementary 
Information section 3.G). The computational complexity of the model 
is discussed in Supplementary Information section 8.C. In Fig. 1c we 
show how FUEL finds the optimal three-dimensional configuration of 
a lattice, helped by the thermal fluctuations from simulated annealing 
that were added to the links, which allow the layout to tunnel through 
the finite potential walls and escape local minima. 

Because FDLs do not take into account the physical dimensions of 
the nodes and links, they typically have multiple link and node cross- 
ings (Supplementary Information section 2). The number of cross- 
ings increases linearly with r, (Fig. 2a), as predicted analytically by 
a geometric model (Supplementary Information section 2). To avoid 
these crossings, we applied ELI and FUEL to several networks with dif- 
ferent topologies (random networks and Barabasi—Albert'® scale-free 
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Fig. 1 | Modelling framework to avoid link and node crossings. a, We 
model each link as a stretched, flexible rubber band, which is represented 
by many short springs connected to each other, pulled apart by elastic 
forces F.. In ELI and FUEL, the links exert a repulsive force F,, on each 
other that falls sharply for radii larger than r,; for the FDL, Fj, =0 and 
r_=0. Whereas in the FDL the links may cross each other (left), in ELI and 
FUEL such crossings are prohibited (right). b, A small network with N=6 
nodes laid out using the FDL (left), which results in multiple link crossings 
(red links). Laying the network out using ELI (right) resolves these 
crossings. c, Evolution of the total link length (main plot) and layout of the 
lattice (insets) during simulated annealing, which determines the final 
layout of a lattice by minimizing the total link length, starting from a 
random layout with 7, < ry. The thermal noise from the annealing helps 
links to pass through each other to resolve crossings. 


networks), sizes and link densities. We find that the networks undergo 
a geometric transition as we increase the link thickness (Fig. 2e-h). 
For small r; (the weakly interacting regime), the ELI and FUEL lay- 
outs are largely indistinguishable from the initial FDL. At low r,, the 
average link length (1) is independent of r,, even as r, increases by 
orders of magnitude (Fig. 2b). This is unexpected, given that there is 
an increase by a factor of ten in the number of potential link crossings 
in this regime (Fig. 2a). The unchanged (/) indicates that ELI and FUEL 
avoid the increasing number of crossing via only a small amount of 
local bending of the links. Similar behaviour is seen for the average 
curvature of the links (C). We find that (C) changes only modestly from 
its value at the smallest r, throughout the weakly interacting regime 
(Fig. 2c), which indicates that despite the multiple bends in some links 
that are necessary to avoid crossings the links remain mostly straight. 
Note that the behaviour of (C) in the weakly interacting regime is mod- 
el-dependent: the movement of nodes in FUEL provides a way of avoid- 
ing crossings that requires less curving of the links. Altogether, we find 
that in the weakly interacting regime local link rearrangements are 
sufficient to avoid the multiple crossings that are present in the FDL. 
Once r, exceeds a critical value r; (the strongly interacting regime), 
we observe a marked change in the geometry of the network (Fig. 2f, h). 
In ELI, with fixed node positions, the links must take long, convoluted 
routes outside the network to reach their end nodes because they are 
unable to find sufficient space between the nodes. This change in the 
link structure is particularly visible in the skeleton of the layout (white 
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links in Fig. 2f, h). In FUEL, with flexible node positions, the links reach 
their destination by pushing the nodes away from each other. These 
changes for ELI and FUEL alter the behaviour of (I), which in the 
strongly interacting regime increases linearly with r,. The change in 
link structure also results in relatively large changes in (C) at r;; after 
the transition, (C) decreases as 1/r,. Despite the different mechanisms 
that underpin the two models, the scalings of (7) and (C) in the strongly 
interacting regime in ELI and FUEL are independent of the network 
topology. The linear increase in (J) and the 1/r, decrease in (C) that we 
observe for both layout models are consistent with isometric scaling, 
indicating that the layouts in the strongly interacting regime are struc- 
turally similar for different r;, to each other if we rescale them by r 
(Supplementary Information section 5.A). 

We determine the origin of the transition in the geometry of the 
networks by estimating the transition point rf. When the links are much 
thinner than the node repulsion range ry, the layout is dominated by 
the repulsive forces between the nodes, which together occupy the vol- 
ume Vy = 4,/2 Nrx./3 (Supplementary Information section 10). When 
the volume occupied by the links becomes comparable to Vy, the layout 
must change to accommodate the links. This change induces the tran- 
sition from the weakly interacting regime to the strongly interacting 
regime. Taking into account the volume of all nodes and links, we cal- 
culate the transition point ?° = r; /r,to be (Supplementary Information 
section 10): 


1/3 
ze 6A 


= (3) 
A/3412B 


where 


A= —12(3¢k?) + f (3/2)? 1283 ) 


a 3 a 
4 n2/3 


() 


c= 


Ris the radius of the layout, k is the degree of the nodes and the average 
(ke? / *) is taken over the degree distribution of the network. In scale-free 
and random networks, in the limit N > oo we obtain? = aL~!/?N1/ 
(Supplementary Information section 10). Given that in many real and 
model networks L ~ mN for some constant m, we obtain 7° «x N “V/ 6, 
therefore, in the limit N — oo we find #* — 0, which implies that the 
weakly interacting regime is absent in the thermodynamic limit. In 
other words, in networks with a large number of nodes, the crossings 
are so numerous that they cannot be ignored. Consequently, the FDL 
and other currently used layout tools that do not consider link crossings 
are expected to be inappropriate for large physical networks because 
the layouts of such networks are dominated by crossings. 

Although networks with different N and L transition at different r/ry 
ratios, if we scale rj/ry by ?< the transition occurs near unity for all 
networks. Using the scaling exponent of the average link length 
(¢(1) = d[log((1))] /d[log(r,)]) as the order parameter, the data collapse 
to a single curve (Fig. 2k), confirming the validity of equation (3). The 
fact that the transition points of networks with different topologies 
(scale-free and random networks, lattices and random geometric 
graphs; Supplementary Information section 11) exhibit similar depend- 
ences on r, suggests that the transition shown in Fig. 2 is independent 
of the topology and degree distribution of the network. 

Analysis of the effects of the size of the network on the scaling of the 
order parameter (finite-size scaling analysis) indicates that the layout 
transition occurs over a small, but non-zero range of r,/ry, regard- 
less of the network size (Supplementary Information section 11). This 
result suggests that we are observing a crossover’”’'® from mean-field 
behaviour (#(J) = 0) to scaling behaviour (¢(/) = 1). For ELI and FUEL, 
the weakly interacting regime is well described by an FDL with local 
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Fig. 2 | Crossover in network layouts. a, The number of link crossings in 
random (circles) and scale-free (stars) networks using ELI (orange) and 
FUEL (blue) are calculated assuming that the links are straight (that is, 
using the FDL), and then normalized by the total number of pairs of links 
to obtain the crossing fraction shown. The number of link crossings in the 
FDL grows linearly with r, (grey dashed line), saturating at very high ry. 

A physically realistic layout must resolve this increasing number of 
crossings. b, In all four cases, the average link length (/) remains largely 
constant in the weakly interacting regime, but grows linearly (dashed grey 
line) in the strongly interacting regime. c, The average link curvature (C) 
increases slowly in the weakly interacting regime, with FUEL exhibiting 
higher average curvature than ELI, then falls linearly (dashed grey line) in 
the strongly interacting regime. d, The relaxation time of the simulated 
annealing grows substantially near the transition point ** =r, /ry = ",/Ty 
(vertical pink line). e-h, ELI (e, f; orange) and FUEL (g, h; blue) layouts 
for a Barabasi—Albert network'® with N= 20 nodes and minimum 

degree m= 2. When r, < ry, the ELI (e) and FUEL (g) layouts are similar 
to the FDL (not shown). For larger r,, links bend to avoid each other: for 


perturbations to resolve possible link crossings. However, this regime 
disappears in the thermodynamic limit (N — oo). In this limit, only the 
strongly interacting regime is observed, which is dominated by strong 
link-link interactions and displays universal scaling. 

The crossover that we observe also alters the physical properties of 
the network. For example, the response of a network to external forces 
is captured by the Cauchy stress tensor’? T,,,, = 0,,0,V (Supplementary 
Information section 6), which depends on the physical and material 
properties of the nodes and links. In the weakly interacting regime the 
links are mostly straight; hence, the node terms Vy and Vy, dominate 
the total stress. Because each node is surrounded by a varying number 
of other nodes, the stress does not spread uniformly in all directions 
but has shear (off-diagonal) components—a common feature of solids. 
In the strongly interacting regime, the links fill up the space; hence, the 
link contributions V. + V1, dominate T,,,, resulting in a diagonal total 
stress tensor (Supplementary Information section 6). In other words, 
we predict that networks in the strongly interacting regime will display 
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ELI (f), the links do not fit inside the region containing the nodes and 
make outward arcs; by contrast, because nodes are free to move for FUEL 
(h), the layout behaves more gently (that is, it contains shorter links, which 
bend less relative to ELI). In f and h, the bottom left parts of the image 
show the full-scale networks and the top right parts show the node and 
link ‘skeletons, with the colours inverted, to help to visualize the geometry. 
i, In the weakly interacting regime (rn, < ry), the links are thin and the 
radius of the entire layout is approximately the radius R of the bounding 
sphere that surrounds the N nodes of radius ry. 1, At larger r,/ry, thick 
links avoid crossing each other and their volume dominates the volume of 
the whole layout. j, The order parameter ¢ (1) = d[log((/))] /d[log (7) ] 
(the scaling exponent of (1) x rf ) versus r,/ry for networks with different 
N (50-200) and L (97-1,159) and different geometries (orange, random; 
blue, scale-free). k, Rescaling the ratio rj/rx by 7* (equation (3)) collapses 
the transition point, shown where #(1) = 1/2 (red circle). This transition 
occurs over a small range of r,/ry (pink shaded area in a—d) regardless of 
the system size, providing evidence of a crossover. The black dashed curve 
is a smooth fit to the order parameter. 


a fluid or gel-like response to external stress. To test the validity of the 
solid—gel transition, we compress the networks generated by FUEL in 
the y direction and measure the tensile forces o,,= T,,,, (Fig. 3a; 
Supplementary Information section 6). We again observe a crossover 
at the value of 7* predicted by equation (3) from a roughly constant 
stress in the weakly interacting regime to a monotonically increasing 
stress in the strongly interacting regime (Fig. 3b). Furthermore, 
as we rotate the network, we find that the stress ratio o)/o_ displays 
large fluctuations in the weakly interacting regime—behaviour that is 
often observed in anisotropic solids. The fluctuations vanish at the 
transition point 7 and the stress ratio settles to the hydrostatic ratio 
gj / o=1/ ./2 (Fig. 3c)—as expected for gels under pressure. 

In summary, the geometry of physical networks is characterized by 
two distinct regimes: a weakly interacting regime, in which the overlap 
between the nodes and links is avoided via local link rearrangements, 
and a strongly interacting regime, the layout of which is shaped by the 
link-link expulsion. Networks in the weakly interacting regime are 
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Fig. 3 | Stressed networks and 3D printing. a, The build-up of tensile 
stress in the nodes and links as a result of compressing the network 
between two walls. Arrows indicate tensile stress components: cyan, 
parallel to the direction of compression, (x); green, perpendicular to the 
direction of compression, 0, (x). The networks are coloured on the basis of 
the total amount of stress. In the weakly interacting regime (left), the stress 
is concentrated in the nodes; in the strongly interacting regime (right), 
almost all of the stress is in the links. b, The parallel stress component oy of 
scale-free (squares) and random (circles) networks as a function of scaled 
link thickness (r,/17)/7*. Because the definition of x, y and z is frame- 
dependent, we average the forces over 50 random network orientations. 

c, The ratio of parallel and transverse tensile stress components ¢ ,/o}. 
Error bars in b and c correspond to one standard deviation around the 
mean, calculated over the 50 random orientations. In the weakly interacting 
regime, the ratio depends on the orientation of the layout (as can be seen 
from the large error bars from averaging over the orientations), which 
indicates solid-like behaviour. In the strongly interacting regime, the 
fluctuations in 7 |/o| decay, yielding a constant ratio. d, e, Visualization of 
networks. As an example, we consider a network with N= 184 and L=716 
that represents ingredients that share flavour compounds”. A three- 
dimensional rendering of the FDL (d) results in multiple crossing (red). 
The inset in d highlights a densely connected region (corresponding to 
dairy products) with a lot of overlap; consequently, it is difficult to discern 
the underlying network. By contrast, when laying out the flavour network 
using FUEL (e; printed using a commercial 3D printer), the crossings 
disappear, unveiling the inner structure of the network. 


solid-like, whereas those in the strongly interacting regime behave like 
gels. The transition that we observe between the two regimes is unique 
to three dimensions: because links are effectively one-dimensional 
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objects, the non-crossing condition results in knot-like constraints in 
three dimensions, which prevent the links from passing through each 
other. In four dimensions or more, knots of one-dimensional objects 
can be untied”®, so the non-crossing conditions will not constrain the 
geometry. Therefore, three is the lowest number of dimensions in which 
links can avoid each other by bending and the highest in which they 
cannot pass by each other without breaking or tunnelling. 

Both regimes have applications. In contrast to the physical networks 
considered thus far in which the nodes and links have physical sizes, 
many networks, such as disease-gene interactions, are more abstract, 
with no real three-dimensional manifestation. In such cases, the layout 
of the network is not limited by the physical constraints of the system, 
but can be chosen in such a way to best visualize the underlying net- 
work structure. Thus, the weakly interacting regime is appropriate for 
network visualizations because it clearly separates nodes and links and 
is amenable to 3D printing, which provides a way of interacting with 
the network and exploring its inner structure directly. As an example, 
we consider a network with 184 nodes and 716 links that represents 
ingredients that share flavour compounds”’. For networks such as 
this with high link densities, two-dimensional visualizations suffer 
from visual cluttering, making only a fraction of the links visible’’. A 
three-dimensional layout may provide more clarity, but the FDL still 
exhibits node and link overlap (Fig. 3d), obstructing the details of the 
geometry of the network. By contrast, when applying FUEL and choos- 
ing r, to be sufficiently small that the layout is in the weakly interacting 
regime, we obtain a geometry that reveals the underlying structure of 
the network and is amenable to 3D printing (Fig. 3e). Given that for 
large N link crossings in the FDL are inevitable, the method introduced 
here to resolve crossings will be essential as we aim to visualize large 
networks. Although the weakly interacting regime vanishes in the ther- 
modynamic limit (N — oo), for a large but finite network with a fixed 
number of nodes we will always be able to choose r, and ry so that we 
stay in the weakly interacting regime. 

The strongly interacting regime is directly relevant to the brain—a 
three-dimensional physical network in which the close-packing of the 
axons is critical to their ability to form synapses”””?, A scaling law of 
V,, Ai between the volume V,, and surface area Ay of the white 
matter in rodent brains has been observed previously”. This law 
implies that in these networks the average neuron length scales with 
the axon thickness as (1) = V,, / A,, x 1r,, as predicted for the strongly 
interacting regime (Fig. 2b). If we describe anatomical regions as nodes 
and axon bundles connecting the anatomical regions as links, then the 
thickness of the axon bundles r, is comparable to the size of the ana- 
tomical regions. This result supports the prediction of the empirical 
scaling that these brain networks are in the strongly interacting regime. 
Thus, equations (1) and (2) provide an appropriate modelling frame- 
work to capture the geometry of dense neuronal networks, generating 
a layout that minimizes the total link length”*”° while respecting the 
non-crossing conditions that axons must obey!. 


Data availability 

All data used in the figures were generated using the simulation code available 
at https://github.com/nimadehmamy/3D-ELI-FUEL. The data that support the 
findings of this study are available from the corresponding author on reasonable 
request. 
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Abrupt ice-age shifts in southern westerly winds 
and Antarctic climate forced from the north 


Christo Buizert!*, Michael Sig]**, Mirko Severi’, Bradley R. Markle’, Justin J. Wettstein!®, Joseph R. McConnell®, Joel B. Pedro”, 
Harald Sodemann*, Kumiko Goto- Azuma’, Kenji Kawamura’, Shuji Fujita’, Hideaki Motoyama’, Motohiro Hirabayashi’, 
Ryu Uemura!®, Barbara Stenni", Frédéric Parrenin’’, Feng He!’, T. J. Fudge? & Eric J. Steig* 


The mid-latitude westerly winds of the Southern Hemisphere play 
a central role in the global climate system via Southern Ocean 
upwelling!, carbon exchange with the deep ocean”, Agulhas 
leakage (transport of Indian Ocean waters into the Atlantic)? and 
possibly Antarctic ice-sheet stability’. Meridional shifts of the 
Southern Hemisphere westerly winds have been hypothesized 
to occur®* in parallel with the well-documented shifts of the 
intertropical convergence zone’ in response to Dansgaard- 
Oeschger (DO) events— abrupt North Atlantic climate change 
events of the last ice age. Shifting moisture pathways to West 
Antarctica® are consistent with this view but may represent a Pacific 
teleconnection pattern forced from the tropics”. The full response 
of the Southern Hemisphere atmospheric circulation to the DO 
cycle and its impact on Antarctic temperature remain unclear’®. 
Here we use five ice cores synchronized via volcanic markers to 
show that the Antarctic temperature response to the DO cycle 
can be understood as the superposition of two modes: a spatially 
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homogeneous oceanic ‘bipolar seesaw’ mode that lags behind 
Northern Hemisphere climate by about 200 years, and a spatially 
heterogeneous atmospheric mode that is synchronous with abrupt 
events in the Northern Hemisphere. Temperature anomalies of the 
atmospheric mode are similar to those associated with present-day 
Southern Annular Mode variability, rather than the Pacific-South 
American pattern. Moreover, deuterium-excess records suggest a 
zonally coherent migration of the Southern Hemisphere westerly 
winds over all ocean basins in phase with Northern Hemisphere 
climate. Our work provides a simple conceptual framework for 
understanding circum-Antarctic temperature variations forced 
by abrupt Northern Hemisphere climate change. We provide 
observational evidence of abrupt shifts in the Southern Hemisphere 
westerly winds, which have previously documented!-? ramifications 
for global ocean circulation and atmospheric carbon dioxide. These 
coupled changes highlight the necessity of a global, rather than a 
purely North Atlantic, perspective on the DO cycle. 
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Fig. 1 | Records of abrupt glacial climate variability. a, Ice-core 5!8O 
(average of GISP2 and GRIP*®) at the Greenland summit. b, WDC 
methane’. p.p.b., parts per 10°. c, Antarctic five-core (EDC, EDML, TAL 
and DF) average djn anomaly. d, Antarctic §'8O anomaly at EDML (blue), 
the Antarctic Plateau (average of DF and EDC; red) and five-core average 
(black). All records are synchronized to WD2014 chronology; Antarctic 
data are shown as anomalies relative to the present, with the EMDL and 


HS3 HS2 


WDC CH, (p.p.b.) 


fo) 
foo) 
% 
2 
6 
oO 
= 
<<? 

7 6525143 2 Plateau 1 0 F-8 

35 30 25 20 15 10 


Antarctic Plateau lines offset by +1.4%o and —1.3%o, respectively, for 
clarity. DO interstadial periods are marked in grey and numbered, and 
Heinrich stadials (HS1-HS5) are marked in blue. Isotope ratios are on the 
VSMOW (Vienna Standard Mean Ocean Water) scale. Thin lines show 
records at the original resolution (ranging from about 5 to about 50 yr), 
and thick lines are a moving average (with a 300-yr and 150-yr window for 
Antarctic and Greenland data, respectively). 
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Fig. 2 | Antarctic climate response to DO warming. a, Stack of North 
Greenland Ice Core Project (NGRIP) 6'8O records. b, Stack of Antarctic 
8'80 records at the indicated locations, with ‘Plateau’ representing the 
average of DF and EDC. ¢, As in b, but with the five-core mean subtracted. 
d, First two principal components of the Antarctic §!8O stacks (1,500-yr 
window), with the percentage of variance provided (PC2 is offset by —0.15 
for clarity). PC1 is strongly correlated (r= 0.998) to the Antarctic mean. A 
linear fit to PC1 (interval from t= —400 yr to t= 0) is shown to highlight 
the response around t= 0. e, Proposed oceanic and atmospheric modes, 


During the glacial DO cycle, abrupt variations in northward heat 
transport by the Atlantic meridional overturning circulation (AMOC) 
affect Greenland and Antarctic temperatures oppositely (Fig. 1), via an 
oceanic teleconnection called the bipolar seesaw®"'. Antarctica warms 
during Greenland cold phases (stadials) and cools during Greenland 
warm phases (interstadials), with the gradual nature of Antarctic cli- 
mate change reflecting buffering by a large heat reservoir''—probably 
the global ocean interior®. The DO cycle also affects atmospheric circu- 
lation; the intertropical convergence zone shifts southwards during sta- 
dials and northwards during interstadials’. General circulation model 
(GCM) simulations suggest parallel shifts of the Southern Hemisphere 
westerly winds (westerlies)**', but the available observational evidence 
(a deuterium-excess record from West Antarctica’) cannot distinguish 
between such shifts and Pacific-only teleconnections®. Furthermore, 
the impact of the atmospheric circulation changes on Antarctic climate 
remains unknown, and models are inconclusive on this question!°?, 

We use water stable isotope ratios, a proxy for site temperature’, from 
five Antarctic ice cores: WAIS (West Antarctic Ice Sheet) Divide (WDC), 
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obtained using rotated principal component vectors (Methods). Fits 

from change-point analysis (Methods) are also shown. The oceanic mode 
responds at t= 211+ 42 yr and the atmospheric mode at t= 28 + 40 yr (lo 
bounds; Extended Data Table 1). Curves in d and e are normalized to have 
20 =1. f, g, Empirical orthogonal functions EOF1 and EOF2 associated 
with PC1 and PC2 ind, scaled to 5'°O units (%o). h, Spatial pattern 
associated with the atmospheric mode shown in e, scaled to 5!8O units 
(%o). Isotope ratios are on the VSMOW scale. 


EPICA (European Project for Ice Coring in Antarctica) Dronning 
Maud Land (EDML), EPICA Dome C (EDC), Dome Fuji (DF) and 
Talos Dome (TAL). WDC is synchronized to Greenland ice cores at 
high precision via atmospheric methane (Fig. 1a, b)!°; here we synchro- 
nize WDC to the other cores in the interval 10-57 kyr before present 
(BP) using volcanic markers (Methods; Extended Data Fig. 1), greatly 
improving our ability to study the timing of regional Antarctic climate 
variations relative to Greenland. We investigate the Antarctic response 
to DO events using a stacking technique’ in which 19 individual events 
are aligned at the midpoint of their abrupt methane transition in WDC 
and averaged to obtain the shared climatic signal (Methods). 
Antarctica cools in response to DO warming (Fig. 2a, b), consistent 
with the bipolar seesaw theory®!!. In the Antarctic mean 6'8O stack 
(where 6'80 represents the '80/1°O composition), the cooling onset 
occurs about two centuries after the abrupt Northern Hemisphere 
event, providing validation of earlier results from West Antarctica!>. 
There is a spatial pattern to the Antarctic response, however. A step- 
like divergence from the mean signal is seen around f ~ 0 yr (that is, 
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Fig. 3 | Deuterium excess and the Southern Hemisphere westerlies. 

a, DO warming. Shown are data from the NGRIP 8'°O stack (turquoise); 
five-core average Antarctic dj, stack (orange, with fit from change- 

point analysis; see Extended Data Table 1); the SAM index (here the 
leading principal component of sea-level pressure variability south of 
20° S) following a freshwater-forced AMOC perturbation in CCSM3 
(Community Climate System Model version 3) simulations (grey, with fit 
from change-point analysis; see Extended Data Table 1). b, As in a, but 
for DO cooling. c, Magnitude of Antarctic dj, response to DO warming. 
The weak dj, trend before and after the abrupt jump probably reflects the 
SST of Southern Hemisphere vapour source waters following the thermal 
bipolar seesaw*"'. d, As inc, but for DO cooling. Isotope ratios are on the 
VSMOW scale. 


synchronous with Northern Hemisphere climate), with the interior 
East Antarctic Plateau sites (DF and EDC) warming and EDML cooling 
(Fig. 2c). This instantaneous warming over the plateau is particularly 
pronounced at DO events 1, 8, 12 and 14 (Fig. 1d, red curve). 

Using principal component analysis (PCA; see Methods), we find 
that two modes of variability explain more than 96% of signal var- 
iance in the five stacked records (Fig. 2d). The first principal com- 
ponent (PC1; 83% of variance explained) has the triangular shape of 
the Antarctic isotope maximum events—the classic thermal bipolar 
seesaw signal'!—with a spatially homogeneous expression (Fig. 2f). 
The two-century lag behind Greenland warming identifies PC1 as an 
ocean-propagated response’. 

The second principal component (PC2; 13% of variance explained) 
is a step-like function with a heterogeneous spatial pattern (Fig. 2g). 
This mode is very different from the bipolar seesaw. The PC2 response 
is synchronous with Northern Hemisphere warming within precision 
(28 + 40 year lag, 1o bounds; see Methods); this timing, as well as addi- 
tional evidence presented below, suggests that this mode represents an 
atmospheric teleconnection. The PCA does not necessarily separate 
physical processes. We assume two underlying teleconnections: oceanic 
(two-century lag) and atmospheric (synchronous). Some amount of 
each process is included in PC1, as evident by some immediate warm- 
ing around t=0. We perform a rotation of the PCA vectors (Methods) 
to isolate the ‘purely’ oceanic and atmospheric responses (Fig. 2e). The 
associated estimate of the atmospherically forced temperature anomaly 
(Fig. 2h) is cooling at EDML, warming at DE, EDC and TAL, and a neg- 
ligible response at WDC; this pattern is robustly reproduced using dif- 
ferent methods (Extended Data Fig. 6). The magnitude of the Antarctic 
atmospheric response is roughly proportional to the &8O perturbation 
in Greenland ice cores (Extended Data Fig. 4). 
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The Antarctic response to DO cooling is qualitatively similar to the 
DO warming case. The ocean seesaw warming response is delayed by 
226 + 44 yr and the spatial pattern of the empirical orthogonal function 
EOF2 has the opposite sign—that is, additional warming at EDML and 
cooling on the interior of the East Antarctic Plateau (Extended Data 
Fig. 7). The atmospheric signal over Antarctica is much weaker for 
the DO cooling case, with PC2 explaining only 9% of variance. This 
difference is likely due to the fact that DO warmings are more abrupt 
and of larger magnitude than DO coolings. 

To better understand the atmospheric mode, we turn to deuterium 
excess (d), a proxy for vapour source conditions!® commonly used 
to identify changes in atmospheric circulation and vapour transport 
pathways®!”!8. In isotope-enabled GCM simulations, Antarctic d is 
anti-correlated with the Southern Annular Mode (SAM) index®”’. This 
anti-correlation can be understood conceptually: when the Southern 
Hemisphere westerlies are displaced equatorwards (negative SAM 
phase), Antarctic moisture will originate from further north, where 
sea surface temperature (SST) is higher and relative humidity is lower 
(Extended Data Fig. 8b), both of which act to make d more positive!®. 
We use the logarithmic definition of deuterium excess (dj,), which 
preserves isotopic moisture source information better than the linear 
definition®””, 

The Antarctic mean dj, response (Fig. 3a, b) lags behind Northern 
Hemisphere climate by 8 + 48 yr for DO warming and 9 + 42 yr for 
DO cooling, consistent with previous results for WDC®. The observed 
dj response is consistent with a shift in the meridional position of the 
Southern Hemisphere westerly winds and vapour origin, such that they 
move equatorwards in response to Northern Hemisphere warming and 
polewards in response to cooling. The timing suggests propagation to 
high latitudes of the Southern Hemisphere via an atmospheric tele- 
connection. The dj, response is largest for the interior Plateau sites 
(DF and EDC), possibly because their vapour source areas are more 
distant from confounding local effects such as the sea-ice edge”!. The 
response is weak or absent at EDML, possibly because the variability of 
Southern Hemisphere westerlies is relatively weak in the Atlantic sector 
(Extended Data Fig. 9), or because of regional effects, such as wind- 
driven changes to the sea ice, gyre circulation or Weddell Sea deep 
convection. Critically, the four cores that do show a clear dj, response 
collectively sample water vapour from all ocean basins (Extended Data 
Fig. 8a), suggesting that the changes to Southern Hemisphere atmos- 
pheric circulation are zonally coherent and involve all ocean basins 
(rather than just the Pacific basin, as demonstrated previously with 
WDC). 

Figure 4a compares the two independent signals that we attribute 
to a change in atmospheric circulation: PC2 of the 6'°O response and 
the Antarctic mean dj, response. Their time evolution is nearly iden- 
tical, suggesting that they are distinct but consistent manifestations 
of the atmospheric circulation change. The SAM and Pacific-South 
American (PSA) pattern are the leading modes of large-scale Southern 
Hemisphere atmospheric variability with strong influence on Antarctic 
temperature”? We focus our analysis on East Antarctica, where we 
infer the largest response. The SAM (Fig. 4b) clearly impacts East 
Antarctic surface air temperature (SAT) strongly (correlation |r| up to 
0.65) and with the correct sign to explain the warming seen at EDC, 
DF and TAL. East Antarctic warming is seen for a more negative SAM 
index, driven primarily by anomalous atmospheric heat advection” 
(the observed cooling at EDML is discussed below). The PSA (Fig. 4c), 
on the other hand, is not meaningfully correlated with the SAT at the 
East Antarctic sites (|r| < 0.15). We further create a synthetic index that 
is the projection of the atmospheric loadings (Fig. 2h) onto the SAT 
anomaly inferred from the reanalysis at the core locations (Methods). 
The patterns in the SAT and the geopotential height associated with 
this index (Fig. 4d) closely resemble those of the SAM, with warming 
in East Antarctica and roughly annular geopotential height anomalies. 

These tests suggest that the SAM is the closest present-day analogue 
to the temperature response that we identify in the ice-core record, 
corroborating our independent evidence from the dj, data. Although 
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Fig. 4 | Attribution of the atmospheric mode of Antarctic temperature 
variability. a, Comparison of the principal component PC2 of the five 
Antarctic §!80 stacks shown in Fig. 2d (pink; left axis) with the Antarctic 
mean d,, stack shown in Fig. 3a (black; right axis). Isotope ratios are on the 
VSMOW scale. b, Correlation between a standardized monthly SAM index 
and the SAT (2-m temperature) in ERA-Interim* for 1979-2017 (shading; 
colour bar on the right) with superimposed 850-hPa geopotential height 
regressions (10-m contours) and the ice-core atmospheric temperature 


the PSA pattern may have been active during the DO cycle, it does not 
dominate the Antarctic response. 

Our observation-based inferences on the timing and sign of changes 
to the Southern Hemisphere westerlies and the SAM are consistent 
with coupled atmosphere-ocean GCM simulations in which AMOC 
transitions are induced by North Atlantic freshwater forcing®!*”>, Such 
model simulations show a positive shift in the SAM index in response 
to AMOC shutdown and vice versa (Fig. 3a, b); this shift is synchronous 
with the applied forcing within uncertainty (Extended Data Table 1). 
Our observed atmospheric response is more gradual than the model- 
simulated SAM shift, possibly because of the (multi-decadal) data 
resolution in some cores and the fact that the dj, signal is integrated 
over a large moisture source area extending to 20° S. 

Next, we address differences between the ice-core data and the 
modern-day correlation pattern (Fig. 4b), most notably at EDML. The 
reanalysis correlation pattern captures the SAT response to monthly 
internal SAM variability, representing atmospheric heat advection 
anomalies”*. The ice cores, on the other hand, record the response to a 
persistent long-term shift in the SAM'*S, driving changes in SST, strat- 
ification and sea ice extent”””®. We speculate that on longer timescales 
the oceanic influence of the Weddell Sea drives the cooling at EDML 
owing to, for example, enhanced sea ice cover” and stratification and 
a weakening of the wind-driven Weddell Gyre. The negligible warming 
response at WDC is consistent with the relatively weak influence of 
the SAM in West Antarctic seen in the monthly reanalysis (Fig. 4b). 
Our observations may help to constrain the long-term response to a 
persistent SAM shift, on which GCMs disagree’. 

Lastly, we want to highlight additional structure in the Antarctic 880 
stacks that is currently not part of scientific discourse on interhemi- 
spheric climate coupling. Most notably, Antarctic warming appears to 
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mode from Fig. 2h (circles; scale bar from Fig. 2). We note that regions 

of anti-correlation are coloured red (that is, warming in response to a 
negative SAM shift). c, As in b, but for a standardized PSA index. The 
SAM and PSA are taken to be the PC1 and PC2 principal components of 
the 850-hPa geopotential height field south of 20° S, respectively. d, As in 
b, but for a synthetic index of the atmospheric mode created by regressing 
ERA-Interim SAT anomalies at the ice-core sites onto the coefficients in 
Fig. 2h (Methods). 


slow down around t= —400 yr (Fig. 2b), forming a secondary change 
point that precedes the abrupt DO warming events””. Likewise, the rate 
of Antarctic cooling appears to increase 200 years before the abrupt 
DO cooling events (Extended Data Fig. 7b). These secondary change 
points are subtler than the ones analysed in this work, have no appar- 
ent corresponding features in Antarctic dj, or Greenland climate and 
remain unexplained. 

In conclusion, our results show that Antarctica is influenced by 
abrupt changes in Northern Hemisphere climate on two distinct 
timescales, representing a slow oceanic and a fast atmospheric tele- 
connection. In particular, we provide observational evidence for zonally 
coherent meridional shifts in the Southern Hemisphere westerly winds 
in phase with Greenland DO events and their impact on Antarctic 
temperature. Such shifts have implications for global ocean circulation, 
Southern Ocean upwelling and productivity, and atmospheric CO,!"°. 
It is therefore paramount to consider the DO cycle from a global, rather 
than a purely North Atlantic, perspective. 


Online content 

Any methods, additional references, Nature Research reporting summaries, source 
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METHODS 


Volcanic ice-core synchronization. Volcanic reference horizons provide the most 
precise way to synchronize ice-core age scales*!~*°. Within the last decade, progress 
has been made in volcanically linking the EDC ice core to the EDML core*!, the 
TAL core*’ and the DF core**. Here we provide new volcanic stratigraphic links 
between the WDC and the EDML, EDC and TAL cores using pattern matching 
of volcanic peaks in high-resolution records of either sulfur (WDC) or sulfate 
(EDML, EDC and TAL). Extended Data Fig. 1b summarizes the various synchro- 
nizations, with previously published ones indicated with grey arrows and the new 
ones presented here indicated with coloured arrows. 

Volcanic synchronization via sulfur, sulfate or electrical conductivity measure- 
ment (ECM) records is a commonly used technique for Antarctic ice cores, and we 
rely on previously established methods described in detail elsewhere*!~*°. Matches 
are made by identifying sequences of sulfur peaks that have the same relative spac- 
ing in both cores*!-739, Additional confidence in the match points comes from 
approximately proportional acid-concentration levels, smooth variations in the 
resulting annual layer thickness between stratigraphic tie-points, and in some cases 
a distinctive shape of the common signals in the different ice cores. We identify 
773 volcanic ties between WDC and EDML, 396 between WDC and EDC, and 
425 between WDC and TAL (Supplementary Data). 

Stratigraphic matching is performed independently by two of the authors (ana- 
lyst 1 and analyst 2) and then compiled and compared by a third author. Both 
analysts use an iterative approach, in which they first identify the major, unam- 
biguous events. After marking these events (or clusters thereof), they re-align the 
datasets and replot them with the marked events overlapping. At this point, usually 
several of the smaller events are nearly on top of each other. These events are then 
marked and the data are replotted with the newly marked events overlapping, and 
the process is repeated. 

We distinguish three cases: ‘doubly, ‘singly’ and ‘inconsistently’ identified events. 
Doubly identified matches are cases where both analysts identify the same strati- 
graphic match between two cores for a given volcanic event (within a margin of 
a few centimetres). Singly identified matches are cases where only one of the two 
analysts identifies a stratigraphic match. Inconsistently identified matches are cases 
where a single volcanic peak in one core is linked to two different volcanic peaks 
in the other core. Around 99% of the singly identified events are found by analyst 
1, demonstrating a difference in event detection threshold. For example, in the 
WDC-EDC synchronization the median volcanic peak sizes in WDC non-sea-salt 
sulfur are 67.7 and 31.4 p.p.b. for the doubly and singly identified matches, respec- 
tively, demonstrating that analyst 2 is more conservative in assigning match points. 
For comparison, the background non-sea-salt sulfur level is 15 +4 p.p.b. 

Here, all the doubly identified events are assumed to be correct and retained. In 
the (relatively rare) case of an inconsistently matched event, the stratigraphic links 
suggested by both analysts are discarded to avoid ambiguity. Sequences of singly 
identified events are retained only if they occur in between two doubly identified 
match points. In the case of an inconsistently identified event (which is discarded), 
all singly identified matches adjacent to it are discarded also, until another doubly 
identified match is encountered. We give some examples below of hypothetical 
sequences of tie points, and how they are dealt with. In the following, ‘d} ‘s’ and ‘? 
denote doubly, singly and inconsistently identified tie points, respectively. 
d-s-s-s-d. This is a hypothetical series of three s tie points between two d tie 
points. Because both analysts agree on the tie points on either end, there is no 
reason to assume that the s tie points are incorrect; it simply reflects the fact that 
analyst 2 is more conservative in assigning tie points than analyst 1. Therefore 
all tie points are retained in the final synchronization (d-s—s—s—d); s tie points 
are retained in such cases irrespective of how many are present in the series (for 
example, a series of ten s ties would be retained if bracketed on either side by ad). 
d-i-d. In this case the i tie is removed, but the d ties are retained, resulting in the 
sequence d-d in the final synchronization. 
d-s-s-i-s-d. The i tie is removed in all cases. However, in this example the s 
tie points occur adjacent to an i tie point, which casts doubt on their reliabil- 
ity. Therefore they are removed together with the inconsistent tie, and only the 
sequence d-d is retained in the final synchronization. 

The matching is described here for the 10-61 kyr interval of interest (the WDC- 
EDC synchronization extends back only to 57 kyr Bp). In synchronizing WDC and 
EDC, the two analysts identified 473 matches, out of which 103 were doubly, 8 were 
inconsistently and 362 were singly identified (all but two of which were identified 
by analyst 1). The final selection retained the 103 doubly identified matches and 
293 of the singly identified matches; 69 singly identified matches were discarded 
because they were bracketed on either side by an inconsistent match. The 8 incon- 
sistently identified matches all differed by less than 70 cm in EDC (or about 65 yr). 

In synchronizing WDC and EDML, the two analysts identified a total of 793 
matches, out of which 247 were doubly identified, 3 matches were inconsistently 
identified (all adjacent, and not separated by doubly identified events) and 543 
were singly identified (all by analyst 1). The final selection retained the 247 doubly 


identified matches and 526 of the singly identified matches; 17 singly identified 
matches were discarded because they were bracketed on either side by an incon- 
sistent match. The 3 inconsistently identified matches all differed by less than 1.6 m 
in EDML (or about 65 yr). 

TAL proved to be the most difficult of the cores to synchronize, presumably 
because the layers are more strongly compressed in this intermediate-depth core. 
The first attempt at synchronization yielded 55 doubly and 5 inconsistently iden- 
tified matches, with most of the errors in the 770-905 m depth range (up to 5-m 
offsets in TAL). In light of this inconsistency, the analysts reviewed their volcanic 
ties throughout the core, with particular focus on the problematic interval. In the 
revised synchronization, the analysts identified a total of 437 matches, out of which 
253 were doubly identified, 4 matches were inconsistently identified (all adjacent, 
and not separated by doubly identified events) and 180 were singly identified (all 
but 9 by analyst 1). The final selection retained the 253 doubly identified matches 
and 172 of the singly identified matches; 8 singly identified matches were discarded. 
The 4 inconsistently identified matches all differed by less than 75 cm in TAL (or 
about 75 yr). So while the final synchronization shows good agreement between 
the two analysts, we feel obliged to also report the first, less successful attempt. 
The TAL depth range that is hardest to synchronize to WDC (770-905 m TAL 
depth, or 14.9-25.2 kyr Bp) also yielded no matches to EDC ina published study” 
(see yellow triangles at the bottom of Extended Data Fig. 1a). We note that this 
problematic time interval of 15-25 kyr Bp does not influence the main results of 
this work—none of the abrupt DO events used in the stacking lie in this interval 
(DO event 2 is excluded from the stacking because the absence of an abrupt CH4 
signal precludes precise synchronization to Greenland'*). 

The number of doubly and inconsistently identified events provides one way 
to assess the reliability of the volcanic synchronization; the percentage of incon- 
sistently identified events (out of the pool of doubly and inconsistently identified 
events) ranges from 1% to 7% for the three cores. Errors tend to occur in clusters 
of adjacent picks owing to the misidentification of a sequence of peaks; seen in this 
light, each of the WDC-EDML and WDC-TAL synchronizations contains only a 
single inconsistently identified sequence. The observed inconsistencies are always 
relatively small and of the order of a few decades. A second method of assessing 
the reliability of the volcanic synchronization is via the redundancy offered by 
having multiple cores. Whenever three ice cores in Extended Data Fig. 1b are 
connected via three independent synchronizations (that is, whenever the arrows 
form a triangle), this offers the possibility to test the internal consistency of the 
synchronization. Over the age interval of interest (the last 61 kyr), 213 ties had 
been previously identified between EDML and EDC”, as well as 102 ties between 
TAL and EDC. This introduces a degree of redundancy that allows testing the 
internal consistency of the synchronization. EDC, TAL and EDML are volcanically 
synchronized within the AICC2012 (Antarctic Ice Core Chronology), whereas 
WDC uses the independent WD2014 timescale*”**, For each volcanic tie point, the 
difference between the WD2014 age and the AICC2012 age is shown in Extended 
Data Fig. la. EDC and EDML are volcanically synchronized over the last 60 kyr 
(blue triangles), and therefore the excellent agreement between the blue (WDC 
age minus EDML age) and red (WDC age minus EDC age) curves in Extended 
Data Fig. 1a shows the volcanic framework to be internally consistent. Likewise, 
for the period 25-42 kyr Bp and <13 kyr Bp, where TAL and EDC are volcanically 
synchronized (yellow triangles), the yellow (WDC age minus TAL age) and red 
curves agree well, suggesting internal consistency. Beyond 42 kyr Bp the yellow 
curve deviates from the blue and red ones, suggesting that the TAL core is imper- 
fectly synchronized within the AICC2012 chronology (owing to an absence of 
volcanic ties at the time). 

All ice cores used in this study were synchronized to the WD2014 chronol- 
ogy*”**. For the four non-WDC cores, we start from their original ice-core 
chronologies; this is the AICC2012 chronology*® for the EDML, EDC and TAL 
ice cores, and the DF2006 chronology for the DF core*’, For each core we add 
a time-variable offset to the WD2014 chronology that is obtained using linear 
interpolation between the volcanic tie points. For the EDML, EDC and TAL cores 
we have direct synchronizations to WDC as described above. For the DF core, 
synchronization is indirect via the EDC core (Extended Data Fig. 1b). Previously, 
297 tie points had been identified** between EDC and DF in the interval of interest 
(mean spacing of 173 years). These volcanic ties are transferred to WDC using the 
WDC-EDC synchronization described above. 

Over the time interval of interest, the offset of the AICC2012 cores (EDML, 
EDC and TAL) ranges roughly from —330 yr to +430 yr, with an average offset of 
—10 yr (with negative values meaning that WD2014 is younger than AICC2012). 
For the DF core the range is from —230 to 1,884 yr, with an average of +739 yr. 
Uncertainty in volcanic synchronization. There are two types of uncertainty to 
consider. First, the volcanic ties themselves may be incorrect; second, between ties 
the interpolation strategy introduces an error. The first type is difficult to quantify. 
Either the ties are correctly identified and the relative age uncertainty is zero, or 
the ties are false and the relative age uncertainty is infinite (that is, we have learned 
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nothing). Past studies have sometimes assigned Gaussian errors to volcanic tie 
points; although this is a practical necessity for applications that optimize the fit to 
multiple age constraints***°”, it does not reflect the true uncertainty meaningfully 
and is not applied here. 

We have high confidence in the correctness of the volcanic ties because of 
the internal consistency of the new volcanic ties with previously published ones 
(Extended Data Fig. 1a), and the fact that doubly identified ties greatly outnumber 
the inconsistently identified ones. We have removed inconsistent matches from the 
synchronization, and we assume the remaining matches to be correct. 

The second type of uncertainty is due to interpolation between volcanic ties. 
This introduces an age uncertainty that depends on the age difference between 
adjacent ties, L. We estimate the interpolation uncertainty using the layer-counted 
section of the WD2014 chronology, which goes back to 31.2 kyr Bp. To estimate the 
interpolation uncertainty for two volcanic markers that are, say, L= 100 yr apart, 
we can randomly pick thousands of 100-yr intervals from the WD2014 chronol- 
ogy. Within each of these, the age evolution deviates from the assumed linear 
interpolation, and 1g of this age deviation is used as the interpolation uncertainty. 
Typical results are shown in Extended Data Fig. 2a for several values of L. We find 
that the maximum uncertainty scales as xL?. Compared with East Antarctica, 
West Antarctica receives a larger contribution of its snowfall from storm systems 
and synoptic activity, making accumulation rates more variable*; the estimates 
given here should thus be considered conservative when used in the interior of 
Antarctica. The volcanic interpolation uncertainty for the four cores is plotted in 
Extended Data Fig. 2b. For DE, synchronization to WDC is done using EDC as 
an intermediary core, and therefore the two synchronization errors are added in 
quadrature. 

In our synchronization we use both the singly and doubly identified tie points 

and treat them equally. We acknowledge, however, that the doubly identified ties 
are more reliable. Therefore, we repeat the analyses described in the main text 
using only the doubly identified tie points (as opposed to both singly and doubly 
identified tie points) and we find that the conclusions of this work do not depend 
on this choice of tie points. Those tests are not shown here, but alternative chronol- 
ogies that use only the doubly identified tie points and alternative versions of the 
manuscript figures showing those analyses are available from the corresponding 
author upon request. 
Ice-core water stable-isotope data. A combination of previously published and 
unpublished ice-core water isotope data (5'%O and 8D; where &D represents the 
°H/'H composition) are used in this study. The deuterium excess (din) is calculated 
from the &'80 and 8D isotope ratios using the logarithmic definition of ref. *°: 


dj, = In(1-+ 8D) —8.47 In(1 + 880) + 0.0285[In(1 + 8'80)]” 


For WDC we use previously published 6!O and 6D data*!>5 measured using laser 
spectroscopy. The data have a typical depth resolution of 1 m for the 0-2.3 kyr BP 
interval, of 0.5 m for 2.3-56 kyr Bp and of 0.25 m for 56-68 kyr Bp; this corresponds 
to an average time resolution of 17 yr for the study period (11-61 kyr Bp). Using 
the centimetre-scale continuous-flow-analysis record of WDC &'40 instead gives 
identical results to those presented here“®. 

For EDML we use previously published §'°O and 8D data*”“® measured using 
conventional isotope ratio mass spectrometry (IRMS). The data have a typical 
depth resolution of 0.5 m, corresponding to an average time resolution of 24 yr 
for the study period. 

For EDC we use previously published §'8O and 8D data‘*“? measured using con- 
ventional IRMS. The data have a typical depth resolution of 0.55 m, corresponding 
to an average time resolution of 44 yr for the study period. 

For DF we use new and published water isotope data*”*”*!. Two datasets are 
used. The first contains §!®O data measured using IRMS in the 300-1,151 m depth 
range (10-71 kyr Bp) at 0.5-m resolution®°. The second is a set of §'8O and 8D data 
measured using IRMS in the 550-849 m depth range (23-45 kyr Bp); this dataset 
was measured at 0.1-m resolution and averaged into 0.5-m bins. We note that din 
is only available from the second dataset, which spans DO events 2-11 (DO event 
numbering is shown in Fig. 1). In the depth range of overlap, 6'*O data from both 
datasets are averaged (after correcting the second dataset by +-0.213%o to account 
for a calibration offset) to produce the final time series. The combined 5'8O record 
has an average time resolution of 35 yr for the study period. 

For TAL we use a combination of new and previously published!*°**? data. 
Bag-average, 1-m resolution §!8O and 6D data measured using IRMS are available 
for the entire core. High-resolution (0.1 m) 6’8O data measured using IRMS are 
available in the 598-786 m (10-16 kyr Bp) and 1,030-1,282 m (37-65 kyr Bp) depth 
ranges. High-resolution §'8O data are averaged into 0.5-m bins and combined with 
bag-average data for the remaining depth intervals. The combined 8'°0 record has 
an average time resolution of 50 yr for the study period. 

Stacking procedure. The stacking procedure used to investigate the Antarctic 
climate response to abrupt DO variability is described in detail elsewhere’. 
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In short, the individual Greenland events are aligned at the midpoint of their abrupt 
8180 transitions (either DO warming or DO cooling). All Antarctic events (on 
their volcanically synchronized WD2014 timescales) are aligned at the midpoints 
of the abrupt WDC CH, transitions. We then average over events to obtain the 
shared climatic signal; to derive the north-south phasing we use the independently 
established CH, delay of 56+ 19 yr (1c) behind &'°O in Greenland™. 

For one of the abrupt events (DO 10), improved inter-polar synchronization 
data are available from '°Be variations during the Laschamp event (41 kyr BP) 
between the Greenland NGRIP core and the Antarctic EDC and EDML cores”; 
these timing constraints are incorporated into our stacking procedure (DO 10 
only). The EDML, EDC, DF and TAL ice cores have CH, records with much higher 
(gas age—ice age) difference (Aage) uncertainty and lower resolution than WDC, 
therefore the north-south phasing precision cannot be improved by considering 
CH4 synchronization for those cores also. 

In this work we consider DO 0 (that is, the Younger Dryas—Holocene transition) 
through DO 16; DO 17 falls outside the volcanic synchronization for the EDC and 
DE cores. DO 2 is omitted owing to the absence of a clear CH, response, precluding 
synchronization. All stacked records are shown in Extended Data Fig. 3. 

Uncertainty in the timing of the stacked records comes from the following 
sources: (1) the Aage in the WDC core®’; (2) DO midpoint detection in the abrupt 
NGRIP &'80 record; (3) DO midpoint detection in the abrupt WDC CH, record; 
(4) interpolation of the WDC chronology between tie points; (5) the climatic lag 
of 56 + 19 yr (10) of atmospheric CH, behind Greenland 6!80*; (6) trend analysis 
in the BREAKFIT* and RAMPFIT™ fitting routines; and (7) volcanic synchroni- 
zation to the WD2014 chronology. 

The combined uncertainty due to the first five items was assessed previously 
using a Monte Carlo routine, suggesting a 1o timing uncertainty of 38 yr and 41 yr 
in the WDC stacks for DO warming and DO cooling, respectively'®. The uncer- 
tainty in the trend analysis is given in Extended Data Table 1. The uncertainty in 
the volcanic synchronization is shown in Extended Data Fig. 2; averaged over the 
stacked events, the 1¢ synchronization uncertainty is 0 yr at WDC, 2 yr at EDML, 
6 yr at EDC, 13 yr at DF and 2 yr at TAL. 

By stacking only the most prominent DO events (those following Heinrich 
events; that is, DO events 0, 1, 4, 8, 12 and 14) or just the minor ones (the remain- 
der), we find that the magnitude of the atmospherically forced Antarctic response 
is larger for the former, suggesting proportionality with the climate perturbation 
(Extended Data Fig. 4a-f). Proportionality of the atmospheric response is further 
seen for individual events (Extended Data Fig. 4g; see the figure legend for details). 
PCA. PCA allows different climatic modes to be identified in (palaeoclimatic) 
time series from different locations®’. Here we perform PCA on the stacked 6!°O 
records at the five individual sites using the MATLAB function pea. The §'%O 
stacks discussed in the main text combine 19 individual events—all those that 
fall within the volcanic synchronization interval. To assess the sensitivity of our 
conclusions to including or excluding individual events, we perform additional 
experiments in which we stack only n events (rather than all 19). The 1 events are 
randomly sampled without replacement (that is, any given event cannot be picked 
twice for each stacking). We then perform PCA of these stacked records (the same 
events are stacked for each of the cores) and standardize the principal component 
vectors by taking the z-score (or standard score). Extended Data Fig. 5 shows typ- 
ical results for n =2 and n=8, where for each n we repeat the experiment 50,000 
times to obtain reliable statistics. Extended Data Fig. 5a, b shows PC1 and PC2, 
respectively, with the solid line showing the mean of 50,000 experiments and the 
shaded envelope the associated +1c. Extended Data Fig. 5c shows a histogram of 
the percentage of variance explained by each of the modes. 

We find that even by stacking as few as just n = 2 events, the method can, on 
average, identify the oceanic and atmospheric components described in the main 
text. Perhaps not surprisingly, we find that when including fewer events the sig- 
nal-to-noise ratio decreases: with fewer events the estimated signal amplitude is 
smaller in both PC1 and PC2, and the uncertainty envelope is wider. As more 
events are included in the stacking, the percentage of variance explained by PC1 
increases as the coherence between the various Antarctic cores increases owing to 
the improved signal-to-noise ratio. 

The analysis discussed in the main text uses a 1,500-yr window (centred around 
t= 100 yr), which is chosen because it corresponds to the recurrence time of the 
shortest DO cycles**“*!, The variance explained by the oceanic and atmospheric 
modes depends on the window length, as shown in Extended Data Fig. 6a. For 
window lengths exceeding about 750 yr, the oceanic mode explains most of the 
variance (PC1), with the atmospheric mode explaining less. However, at short 
window lengths (<750 yr) the atmospheric mode explains most of the variance, 
making it PC1. The comparison of PC] at a 400-yr window with PC2 at a 2,000-yr 
window (Extended Data Fig. 6b) illustrates the ability of the PCA to identify the 
atmospheric mode at different window lengths. The crossover behaviour (that is, 
the atmospheric mode shifts from being PC2 to being PC] as a function of window 
length) is due to the fact that the signal variance of the step-like atmospheric mode 
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occurs chiefly within the t= 0 to f= 100 yr interval; signal variance of the oceanic 
mode depends strongly on the window length owing to its gradual nature. 
Lag-time analysis of PC1 and PC2 using the RAMPFIT* and BREAKFIT™ 
routines confirms the crossover behaviour. At short window lengths (<700 yr), 
PC1 is characterized by the instantaneous or fast response of the hypothesized 
atmospheric teleconnection, whereas at large window lengths (>800 yr) it shows 
the hypothesized 200-yr delayed oceanic teleconnection. PC2 shows the opposite 
behaviour; however, we note that for PC2 at a 400-yr window length no meaning- 
ful solution can be found with either routine. At window lengths >700 yr the lag 
times remain stable and vary only within the uncertainty bound. Unless specified 
otherwise, a 1,500-yr window is used in this work. 
Robustness of the atmospheric spatial pattern. The spatial pattern that we iden- 
tify for the atmospheric mode is one of the main results of this work, and we here 
test its robustness. In Extended Data Fig. 6d-f we compare three alternative ways 
of identifying the spatial pattern; the Pearson correlation coefficient (r) between 
the shown pattern and the pattern identified in the main text (EOF2 at a 1,500-yr 
window) is given for each. Details on the three methods are given in the figure leg- 
end. Correlation coefficients ranging from r= 0.92 to r= 0.999 suggest that identi- 
fication of the atmospheric pattern is both qualitatively and quantitatively robust. 
Rotated PCA vectors. PCA aims to explain the largest amount of variance, whereas 
our goal is to distinguish between the oceanic and atmospheric modes. While 
PC1 is clearly dominated by the 200-yr-delayed oceanic bipolar seesaw (Fig. 2d), 
it appears that PC1 also captures some abrupt warming around t=0, apparently 
because atmospherically induced warming is more prevalent over Antarctica than 
cooling (Northern Hemisphere DO warming case). We therefore construct (admit- 
tedly somewhat subjective) oceanic and atmospheric response functions (Fig. 2e), 
which are derived from the principal components in the following way. Let PC1 
and PC2 be the first two principal components through time; these vectors are 
perpendicular. We let the atmospheric response, ATM, be identical to PC2. The 
oceanic response, OCE, is found by rotating the PCI vector in the plane spanned 
by vectors PCI and PC2 over an angle of —13°: 


OCE = cos(—13°)PCI1 + sin(—13°)PC2 


The rotation angle of —13° is picked such that the 6'8O shift at t= 0 is minimized. 
The spatial pattern associated with ATM (Fig. 2h) is found by multiple linear 
regression of the 6'8O stacks at the individual sites to OCE and ATM using the 
MATLAB function regress. 

SAM, PSA and synthetic ‘atmospheric’ indices. The SAM and PSA indices are 
calculated from climate-model and reanalysis data as the first and second mode of 
variability, respectively, in the PCA/EOF analysis (MATLAB function pca) of the 
sea-level pressure (model) or the 850-hPa geopotential height (reanalysis) south 
of 20° S, after subtracting the long-term mean and scaling the anomalies by the 
square root of the cosine of the latitude to account for the decreased surface area 
closer to the poles. A synthetic index of the inferred atmospheric teleconnection 
is created by projecting the reanalysis SAT anomalies at the ice-core locations 
onto the atmospheric loading coefficients from Fig. 2h, the spatial signature of 
which is shown in Fig. 4d. Mathematically, this projection is done by multiplying 
the SAT anomalies with the loading coefficients and summing over them at each 
monthly time step. 

It is worth noting that all these inferences with respect to reanalysis data rely 
upon just five atmospheric loading coefficients from the limited (from the per- 
spective of large-scale circulation) spatial domain of Antarctica, so it is difficult 
to rigorously exclude non-SAM atmospheric influences from the temperature 
pattern alone. 

In Extended Data Fig. 9 we compare the variability in the Southern Hemisphere 
westerly winds calculated with glacial CCSM3 climate simulations” with that 
obtained from the ERA-Interim reanalysis*°. Both results show greater variability 
in the Indian and Pacific sectors than in the Atlantic sector. Compared to ERA- 
Interim, the CCSM3 SAM is more zonal/annular in its structure; the CCSM3 west- 
erlies also appear to have smaller variability (some of this difference could be due to 
comparing annual mean with decadal mean data). In CCSM3 the forced response 
of the Southern Hemisphere westerlies (at 19 kyr Bp, in response to increased 
North Atlantic freshwater; right panel) is very similar to the internal variability of 
the Southern Hemisphere westerlies (middle panel), suggesting that the change in 
the Southern Hemisphere atmospheric circulation induced by freshwater forcing 
in the North Atlantic is analogous to the existing mode of internal variability. 

To estimate the magnitude of SAM shifts of the DO cycle, we analyse the 
changes in central East Antarctica, where the signal is the largest and most con- 
sistent with the observed present-day relationship (Fig. 4). Using an isotope sen- 
sitivity of 0.8% K~!, the atmospherically induced temperature anomaly in central 
East Antarctica (DE, EDC) is in the range 0.20-0.45 °C during DO warming (the 
lower and upper bounds reflect typical minor and major DO events, respectively). 
Regression of the ERA-Interim 2-m temperature at DF and EDC to the SAM index 


shows a slope of around —1.2°C per unit of normalized SAM (the normalized SAM 
index time series has a standard deviation of 1), implying a shift in the SAM index 
of around 0.2 to 0.4 normalized (modern-day) SAM units (rounded to one sig- 
nificant figure). This estimate assumes: (1) a linear isotope-temperature response 
using the modern-day spatial slope, and (2) that the monthly SAM-SAT regression 
from the monthly internal SAM variability also applies to persistent SAM shifts 
during the glacial DO cycle; both assumptions are subject to uncertainties that we 
do not address here. The CCSM3 model simulates a persisting SAM shift of the 
same magnitude as the internal SAM variability in decadal averaged data (Fig. 3); 
because internal SAM variability will be larger in monthly than in decadally aver- 
aged data, the model and database estimates may be in agreement. The reanalysis 
time period is too short to derive robust estimates of decadally averaged SAM 
variability. 

GCM simulations. We analyse previously published TraCE-21k transient climate 
model simulations with CCSM3****-*°, AMOC collapse and resumption are trig- 
gered using freshwater forcing in the North Atlantic. The ‘DGL19k run is used for 
the AMOC collapse and the ‘DGL-overshoot-C’ run for the AMOC resumption 
case™, 

Moisture origin analysis. We use two separate experiments to trace the moisture 
origin of the precipitation at the coring sites. The first method uses a Lagrangian 
moisture source diagnostic® based on a previously published dataset®”. Using the 
winds, temperature and humidity of the ERA-Interim reanalysis dataset*” covering 
the years 1980-2013, we calculate 5 million air parcel trajectories covering the 
global atmosphere at a resolution of 6 h using the Lagrangian particle dispersion 
model FLEXPART™. From the analysis of specific humidity changes over time 
along the air parcel trajectories, moisture sources are identified whenever the spe- 
cific humidity in the air parcels increases more than a threshold value of 0.1 gkg™! 
over a 6-h period. The fractional contribution of each moisture source to the final 
precipitation at the target location (an area of 300 x 300 km? centred on each 
ice-core site) is obtained from calculating the amount contributed by a moisture 
source to the humidity already present in an air parcel. For precipitation en route, 
the contributions of previous sources are proportionally discounted. This results 
in a quantitative estimate of the contribution of surface areas to the precipitation 
in the target region in units of evaporation (water mass per unit area per unit 
time), including their position (latitude and longitude). The moisture source con- 
tributions for each site and precipitation event are combined into mass-weighted 
annual mean values and scaled with respect to the total amount of water deposited 
at the target region. 

The second method uses water tagging in a 50-yr simulation with the 
Community Atmosphere Model (CAM) with prescribed seasonally varying SST 
and modern boundary conditions. The experiment is set up to evaluate the merid- 
ional moisture source distribution, with further details and figures given in ref. °. 
In short, evaporation is tagged in 11 bins. One bin is the Antarctic continent 
(re-evaporation) and the ocean and ice shelves south of 70° S; the other ten bins 
are zonal bins of 5° latitude each, ranging from 20° S to 70° S. For each core, the 
moisture source distribution is found by evaluating the relative contributions from 
each of the bins for the last 30 years of the run. Two moisture source distributions 
are created, one for all years in which the mean annual SAM index was positive, 
and one for all years in which it was negative (Extended Data Fig. 8b). 
Change-point detection. We use two well documented and widely used change- 
point detection methods, BREAKFIT>® and RAMPFIT™, with the results given in 
Extended Data Table 1. The choice of method is based on the shape of the time 
series x(t). BREAKFIT finds a single change point and fits a linear slope on either 
side; these features make it suitable for the oceanic mode/PC1 discussed in the 
main manuscript. RAMPFIT finds two change points; it assumes that x(t) has a 
constant value x; for t< t, then is ramped up or down until it reaches value x2 at 
time f,, after which it remains constant at value x for t > t,. These features make 
RAMPFIT suitable for the atmospheric mode/PC2 discussed in the main manu- 
script. To find the two change points in the d),, stacks, we apply both the RAMPFIT 
and the BREAKFIT algorithms twice (once for each change point). The results are 
comparable (Extended Data Table 1), and in the main text we report the average 
of the two methods. 

Code availability. The MATLAB code used for the stacking procedure can be 
found in the supplementary information of ref. > and is available from the corre- 
sponding author upon request. 


Data availability 

Source Data (WDC sulfur data, volcanic tie points and water isotope data on syn- 
chronized chronologies) and derived products (stacks, PCA results, etc.) are availa- 
ble in the online version of the paper and in the NOAA palaeoclimate data archive. 
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Extended Data Fig. 1 | Volcanic synchronization of Antarctic ice cores. 
a, Age offset between the WD2014°”*8 (WDC) and AICC2012*° (TAL, 
EDML, EDC) age scales, with each dot representing a volcanic tie point. 
Yellow and blue triangles denote the timing of TAL-EDC and EMDL- 
EDC volcanic ties*””®, respectively. b, Overview of synchronizations 


between the ice cores used in this study. Grey arrows indicate previously 
published synchronizations and coloured arrows denote synchronizations 
performed here. Synchronizations within Antarctica are based purely on 
volcanic links; synchronization between WDC and NGRIP (Greenland) 
are based on atmospheric CH, (green arrow). 
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Extended Data Fig. 3 | Site-specific stacks of 5'°O and diy. a, Stack of cooling. e, Stack of Antarctic dj, at the indicated locations during DO 
NGRIP 8!8O (teal; left axis) and WDC CH, (green; right axis) during DO warming. f, As in e, but for DO cooling. Isotope ratios are on the VIMOW 
warming. b, As in a, but for DO cooling. c, Stack of Antarctic §'8O at the scale. 

indicated locations (see key) during DO warming. d, As in c, but for DO 
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Extended Data Fig. 4 | Proportionality of the atmospheric response. 
a-f, Comparison of stacks of just the major DO events (those following 
Heinrich events, namely, DO events 0, 1, 4, 8, 12 and 14; a, c, e) and just 
the minor DO events (the remainder; b, d, f). a, b, Stacks of NGRIP 

5180 (left axis) and CH, (right axis). c, d, Stacks of Antarctic §'80 at 

the indicated locations. e, f, As in c and d, but with the Antarctic mean 
subtracted. g, Proportionality of the atmospheric response for individual 
events (numbered). The NGRIP event size is found via regression of 


individual NGRIP events to the multi-event NGRIP 5!8O stack normalized 


to unit variance (Fig. 2a). The Antarctic atmospheric response is found 

via multiple linear regression of single-site individual events to the 
atmospheric and oceanic modes (Fig. 2e). Shown are the average (dots) 
and standard deviation (error bars) of the response at EDC, DF and EDML 
(the cores with the strongest atmospheric response); the EDML response 
is multiplied by —1 because it has the opposite sign of the response at 

DF and EDC. Red and blue dots denote the major and minor DO events, 
respectively. Isotope ratios are on the VSMOW scale. 
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© 2018 Springer Nature Limited. All rights reserved. 


LETTER 


600 


) = 
ee ee 

0 
go 84 NGRIP 
vw 4 
os 0.4 a5 
S& 0.2 3h la 
oO 0 Al aN 
90 7 VS 
oO] J aN 
= o | — Plateau 

-0.4 x 
= — Ant. mean 
=e |e —EDML 
S 024 
eS | 
20 
SOs <= --2-- 
= 
= .o004 
| d PC1 (89%) 

2054 
c ; SN Pp 
= 07 
E os / Ss 7 
$5 7” pcr (9%) “J iN Vl 

-14 Y, 

-600 -400 -200 0 200 400 
Time (year) 


Extended Data Fig. 7 | Antarctic climate response to DO cooling. 


a, Stack of NGRIP 5!8O. b, Stack of Antarctic §!80 at indicated locations. 
c, As in b, but with the Antarctic mean subtracted. d, First two principal 
components of the Antarctic §!8O stacks, with the fraction of variance 


5'8O (%o) 


(=) 
5'8O (%o) 


EOF2 


explained (offset for clarity). The lines show the BREAKFIT (PC1) and 
RAMPFIT (PC2) fits. e, f, Empirical orthogonal functions EOF1 and EOF2 
associated with PC1 and PC2 in d, scaled to show the magnitude in units 
of §!8O (%o). Isotope ratios are on the VSMOW scale. 


© 2018 Springer Nature Limited. All rights reserved. 


LETTER 


Dome Fuji 


TALDICE 


15 5 fo 
7) 
2 
© 10 + s 
& os 
Ma) iP) 
2 
57 =) 
ie} 
WY 
0 = 
5 Tor — oo : 0 
70 °S 60 °S 50 °S 40 °S 30 °S 
Latitude 
Extended Data Fig. 8 | Moisture sources of Antarctic ice cores and the curves give the latitudinal source distribution during a negative SAM 
SAM. a, Mass-weighted probability distribution functions of Antarctic phase (solid curves; SAM index <0) and a positive SAM phase (dashed 
moisture sources for the five ice cores of interest (5 x 107° deg? contour curves; SAM index >0). The solid and open dots show the first moment 
lines; the area-integrated probability distribution function equals 1). of the source distribution during negative and positive SAM phases, 
Distributions are calculated from reanalysis data***’ using a Lagrangian respectively. We note that during a positive SAM phase, moisture sources 
source diagnostic?)** (Methods). Parallels are plotted in 15° increments for all core locations are located closer to the Antarctic continent. Source 
of latitude and meridians in 45° increments of longitude. b, SST (black)® distribution data were obtained using water tagging experiments’ in the 


and relative humidity (RH; grey)”° as a function of latitude. The coloured CAM (Methods). 


© 2018 Springer Nature Limited. All rights reserved. 


LETTER 


2 
Ve 
7) 
£ 
me} 
0g 
[os 
a 
me) 
= 
1s 
=2 


ERA-Interim CCSM3 Internal variability CCSM3 Forced 
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metres per second per standard deviation in the index. b, As in a, but for 19 kyr BP). 
internal variability in the CCSM3 TraCE model simulation during 
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Extended Data Table 1 | Change-point analysis of Antarctic response 
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Parameter Response to: toce tatm1 tatm2 Routine: 

5'80 stack DO warming 226 + 23 BREAKFIT 
5180 PC1 DO warming 222 +24 BREAKFIT 
880 ocean DO warming 201812 BREAKFIT 
5"80 stack DO cooling 225419 BREAKFIT 
580 PC1 DO cooling 234 +21 BREAKFIT 
5480 PC2 DO warming 2843 137+3 RAMPFIT 
580 atmos DO warming 28+3 137 +3 RAMPFIT 

di, stack DO warming -11 +34 193 +27 RAMPFIT 

din stack DO warming 28+14 248 +14 BREAKFIT (2x) 
CCSM3 SAM DO warming 13 + 38 68 + 48 RAMPFIT 
5780 PC2 DO cooling 171+5 209+5 RAMPFIT 

din stack DO cooling 6+16 151+19 RAMPFIT 

din stack DO cooling 1245 166+6 BREAKFIT (2x) 
CCSM3 SAM DO cooling 20 + 28 81+29 RAMPFIT 


Change points are found using the BREAKFIT®® or RAMPFIT®® routine, as indicated. The parameter toce is the single change point of the oceanic mode; tarmi and tarm2 are the two change points of the 
atmospheric mode, representing the onset and ending of the shift, respectively (Methods). Stated uncertainties give the lo value in the fitting routine only, found using a Monte Carlo moving-block 
bootstrap analysis with 1,000 iterations®®>5®; the full uncertainty in the combined interpolar CHa synchronization and stacking procedure?® is estimated to be around 40 yr (1c), which can mostly be 
attributed to uncertainty in the WAIS Divide ice age—gas age difference, Aage?”. The 6!80 PC2 and §!80 atmospheric modes are identical. 
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Similar cranial trauma prevalence among 
Neanderthals and Upper Palaeolithic modern humans 


Judith Beier!, Nils Anthes2, Joachim Wahl»? & Katerina Harvati!4* 


Neanderthals are commonly depicted as leading dangerous lives 
and permanently struggling for survival. This view largely relies 
on the high incidences of trauma that have been reported!” and 
have variously been attributed to violent social behaviour*, 
highly mobile hunter-gatherer lifestyles” or attacks by carnivores?. 
The described Neanderthal pattern of predominantly cranial 
injuries is further thought to reflect violent encounters with large 
prey mammals, resulting from the use of close-range hunting 
weapons!. These interpretations directly shape our understanding 
of Neanderthal lifestyles, health and hunting abilities, yet mainly 
rest on descriptive, case-based evidence. Quantitative, population- 
level studies of traumatic injuries are rare. Here we reassess the 
hypothesis of higher cranial trauma prevalence among Neanderthals 
using a population-level approach—accounting for preservation bias 
and other contextual data—and an exhaustive fossil database. We 
show that Neanderthals and early Upper Palaeolithic anatomically 
modern humans exhibit similar overall incidences of cranial 
trauma, which are higher for males in both taxa, consistent with 
patterns shown by later populations of modern humans. Beyond 
these similarities, we observed species-specific, age-related variation 
in trauma prevalence, suggesting that there were differences in 
the timing of injuries during life or that there was a differential 
mortality risk of trauma survivors in the two groups. Finally, our 
results highlight the importance of preservation bias in studies of 
trauma prevalence. 

Neanderthals are commonly depicted as robust hominins who led 
stressful, dangerous lives!*°. Traumatic injuries, considered to be 
common among remains of adult Neanderthals’, are a major piece of 
evidence supporting this hypothesis: not only are Neanderthals pro- 
posed to suffer from a high prevalence of trauma”?!"', but they are 
also thought to exhibit more traumatic injuries than early modern 
humans”!”!3, Explanations for this include violent social behaviour*4, 
a highly mobile hunter-gatherer lifestyle in glacial environments” and 
attacks by carnivores’. Moreover, Neanderthals are thought to show 
unusually high levels of head and neck injuries, attributed to their 
hypothesized reliance on close-range hunting leading to confrontations 
with large prey mammals!. These interpretations have important impli- 
cations for reconstructions of Neanderthal palaeobiology and behav- 
iour, and have shaped the prevailing perception of the species. However, 
they are largely based on anecdotal evidence, because trauma among 
Palaeolithic humans is often reported on a descriptive, case-by-case 
basis. The few systematic, quantitative studies that have been conducted 
to date have yielded contradictory results”*!+'+!°, but question the 
prevailing view of ‘the highly traumatized Neandertal’. 

Current research into Palaeolithic trauma suffers from several lim- 
itations. Most previous work assessed the proportional distribution of 
lesions throughout the body in injured Neanderthal skeletons, compar- 
ing the derived ratios to those of recent humans>-*-!”, Such approaches 
provide insights into individual life histories, but—because they focus 
exclusively on the injured—cannot elucidate population-level trauma 
prevalence. The latter requires an examination of both injured and 


uninjured individuals. Furthermore, contextual factors such as age at 
death, sex and skeletal preservation are rarely accounted for in these 
approaches’. These variables can markedly affect trauma prevalence 
variation!®-?! and lesion visibility in the fossil record, and should thus 
be integral to population-level analyses. Moreover, Neanderthals are 
routinely compared to recent humans—clinical! or forensic> samples, 
rodeo riders! and Holocene hunter-gatherers or nomads”*!>'®—but 
only rarely to Upper Palaeolithic modern humans!’. However, the latter 
are the most appropriate comparative sample, sharing similar environ- 
ments and comparable mobile hunter-gatherer lifestyles. Finally, small 
sample sizes have hampered the validity of the statistical inferences of 
most previous research. 

Here we assess the hypothesis of higher prevalence of cranial 
trauma among Neanderthals relative to Upper Palaeolithic modern 
humans using a population-level comparison, including contextual 
data and using the largest fossil dataset that is currently available. 
We systematically compiled published information on fossil crania 
from the Middle and Upper Palaeolithic of Eurasia, dating to roughly 
80-20 thousand years ago (Fig. 1). Cranial injuries—considered typical 
for Neanderthals!—are a particularly reliable trauma archive, because 
they heal with only minor bone remodelling and therefore leave visible 
lesions even after full recovery”. 

For each specimen, we recorded whether trauma was present (0 or 1), 
the taxon (Neanderthals or Upper Palaeolithic modern humans), sex 
(male, female or unknown), age at death (juvenile to young adult, 
old adult or indeterminate), preserved skeletal element(s) (14 major 
cranial bones), the preservation percentage of each skeletal element 
(<25%, 25-50%, 50-75% and 75-100%) and location (five geograph- 
ical regions within Eurasia; see Supplementary Tables 1, 2). We then 
used generalized linear mixed models (GLMMs) to assess differences in 
trauma prevalence with taxon, sex, age and preservation as explanatory 
variables, while accounting for variation among skeletal elements and 
locations. 

Our systematic literature survey revealed 21 specimens with one 
or several cranial lesions (9 Neanderthals and 12 Upper Palaeolithic 
modern humans; Supplementary Table 3) in our sample of 114 speci- 
mens of Neanderthals and 90 specimens of Upper Palaeolithic modern 
humans (Supplementary Tables 1, 2). At the level of skeletal elements, 
this corresponds to 14 out of 295 cranial elements of Neanderthals, and 
25 out of 541 cranial elements of Upper Palaeolithic modern humans, 
exhibiting at least one traumatic lesion. 

We calculated separate models to predict trauma prevalence at 
the specimen and the skeletal-element level. Our analysis comprised 
two sets of four GLMM models each that were based on hierarchi- 
cally nested subsets of the raw data. The first set (models 1-4; Fig. 2) 
followed an element-based approach, with skeletal elements being 
the unit of analysis; the second set (models 5-8; Fig. 3) was based on 
individuals (see Methods). Trauma was modelled as a binary response 
variable in all models, either per skeletal element or per specimen. The 
random component of the GLMMs comprised skeletal element and 
location in models 1-4, and only location in models 5-8. 


1Paleoanthropology, Senckenberg Centre for Human Evolution and Palaeoenvironment, University of Tubingen, Tubingen, Germany. @Animal Evolutionary Ecology Group, Institute of Evolution 
and Ecology, University of Tubingen, Tubingen, Germany. *State Office for Cultural Heritage Management Baden-Wiirttemberg, Osteology, Konstanz, Germany. “DFG Center for Advanced Studies 
‘Words, Bones, Genes, Tools’, University of Tubingen, Tubingen, Germany. “e-mail: katerina.harvati@ifu.uni-tuebingen.de 
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1 Gibraltar (1,10) 14 La Ferrassie (2,14) 27 Sarstedt (2,2) 
2 Hora (1,1) 15 La Chapelle (1,12) 28 = Kulna (2,2) 
3 Palomas (3,9) 16 Combe Grenal (6,6) 29  Ochoz (1,2) 
4 Cova Forada (1,3) 17. Monsempron (2,2) 30. Vindija (26,33) 
5 Cova Negra (2,2) 18 Hortus (2,3) 31. Subalyuk (1,3) 
6 Gegant (1,2) 19 Fate (2,2) 32. Grotta Breuil (1,1) 
7 Petit-Puymoyen (3,4) 20  Ciota Ciara (1,1) 33. Guattari (3,14) 
8 La Quina-Amont (9,21) 21  Cotencher (1,2) 34 Chagyrskaya (1,1) 
9 Pradelles/Marillac (16,19) 22  Genay (1,5) 35  Zaskalnaya VI (1,1) 
10 St. Césaire (1,8) 23 Spy (2,16) 36 Sakajia (1,1) 
11. Fontéchevade (1,3) 24 Zeeland Ridges (1,1) 37  Shanidar (6,49) 
12 Régourdou (1,2) 25 Neanderthal (1,8) 38 Amud (3,14) 
13 Le Moustier (1,12) 26 Warendorf (1,1) 39 Kebara (2,3) 


Fig. 1 | Neanderthal and Upper Palaeolithic modern human sites. 
Neanderthal sites, blue triangles; Upper Palaeolithic modern human sites, 
red dots. Numbers in brackets indicate number of specimens/number of 
skeletal elements, respectively. Sites Chagyrskaya (34) and Pokrovka (74) 


Model 1 comprised the full dataset of all skeletal elements (n = 836) 
to exclusively assess overall taxon differences in trauma prevalence, 
while ignoring the incompletely scored contextual variables. Model 2 


40  Caldeirdo (1,1) 53 Barma Grande (3,28) 66 Tapolca (1,1) 

41 Parpallo (1,11) 54 ~—Caviglione (1,13) 67 Cioclovina (1,8) 
42 _ Isturitz (1,2) 55 Grotte des Enfants (2,28) 68 Oase (2,14) 

43 Brassempouy (1,2) 56 Arene Candide (1,13) 69 Muierii (2,12) 

44  Vilhonneur (1,4) 57 ~—_Grotta Paglicci (9,20) 70 Bacho Kiro (1,1) 
45 Fontéchevade (1,1) 58 — Ostuni (1,14) 71 Buran Kaya Ill (3,4) 
46  Cussac (1,9) 59  Mladeé (9,42) 72 Kostenki (3,24) 
47 Cro Magnon (3,29) 60  Predmost (11,85) 73 Sunghir (3,26) 
48  Abri Pataud (1,14) 61 Brno (2,11) 74 Pokrovka (1,1) 
49 Mollet (1,5) 62 Pavlov (3,13) 75 Ohalo Il (1,14) 
50  Crouzade (2,3) 63 Dolni Véstonice (6,67) 76 el-Wad (3,3) 

51 La Balauziére (2,8) 64 — Willendorf (1,2) 

52  Baousso da Torre (1,4) 65 Vindija (3,4) 


were projected 2,670 and 2,975 km west for better visualization. The map 
was generated using the QGIS Geographic Information System (https:// 
www.qgis.org) and Natural Earth (http://naturalearthdata.com/). 


(n= 604) excluded skeletal elements of unknown sex and indetermi- 
nate age, thus assessing the additional influence of age, sex, element 
preservation, and the interaction between age and taxon. Given trauma 
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Fig. 2 | Predicted cranial trauma prevalence in skeletal elements from 
Neanderthals and Upper Palaeolithic modern humans. a, Model 1 
includes taxon as the predictor variable (full dataset, n = 836). b, Model 

2 includes the variables taxon, sex, age, element preservation and the 
interaction between age and taxon, but excludes sex unknown and age 
indeterminate skeletal elements (n = 604). c, Model 3 includes taxon as the 
variable, but excludes female and sex unknown skeletal elements (n = 462). 
d, Model 4 includes the variables taxon, age, element preservation, and 

the interaction between age and taxon, but excludes female, sex unknown 


and age indeterminate skeletal elements (1 = 407). Predictions are given 
for skeletal elements when 50-75% complete; predictions for other 
preservation categories scale linearly. Predictions are based on posterior 
estimates of the four GLMMs using a Markov chain Monte Carlo (MCMC) 
algorithm. Sample sizes represent single skeletal elements, treated as 
biologically independent samples in models 1-4 (see Methods). Markers 
denote the predicted means, bars show lower and upper 95% credible 
intervals. NEA, Neanderthals; UPH, Upper Palaeolithic modern humans. 
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Fig. 3 | Predicted cranial trauma prevalence in individual cranial 
specimens from Neanderthal and Upper Palaeolithic modern humans. 
a, Model 5 includes taxon as the predictor variable (full dataset, n = 204). 
b, Model 6 includes the variables taxon, sex, age, specimen preservation 
and the interaction between age and taxon, but excludes sex unknown 
and age indeterminate specimens (1 = 89). c, Model 7 includes taxon as 
the variable, but excludes female and sex unknown specimens (n = 76). 
d, Model 8 includes the variables taxon, age, specimen preservation and 
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predominance in males, we repeated these models on male-only subsets 
in models 3 (n = 462) and 4 (n=407). 

Model 5 comprised all specimens (n = 204) and, corresponding 
to model 1, assessed overall taxon differences in trauma prevalence. 
Model 6 (n = 89) excluded specimens of unknown sex and indetermi- 
nate age to assess how age, Sex, specimen preservation and the interac- 
tion between age and taxon affected trauma prevalence. We repeated 
these models for male-only subsets in models 7 (n =76) and 8 (n=59). 

None of the models showed a quantitative difference in cranial trauma 
prevalence between Neanderthals and Upper Palaeolithic modern 
humans (taxon effect in models 1-8 in Table 1 and Figs. 2a—d, 3a-d). 


Table 1 | Summary statistics of the GLMMs 


NEA UPH NEA UPH 


the interaction between age and taxon, but excludes female, sex unknown 
and age indeterminate specimens (n = 59). Predictions are given for mean 
specimen-preservation scores; predictions for other preservation scores 
scale linearly. Predictions are based on posterior estimates of the four 
GLMMs using a Markov chain Monte Carlo algorithm. Samples sizes 

in models 5-8 represent cranial specimens, comprising one or several 
skeletal elements of the same cranium (see Methods). Markers denote the 
predicted means, bars indicate lower and upper 95% credible intervals. 


Instead, we found a significantly higher prevalence of trauma in males 
compared to females (sex effect in models 2 and 6; Table 1 and Figs. 2b, 3b). 
Furthermore, trauma prevalence significantly increased with pres- 
ervation status, indicating a greater probability to detect trauma on 
more complete skeletal elements or individuals (preservation effect in 
models 2, 4, 6 and 8; Table 1 and Extended Data Fig. 1a). Finally, in the 
element-based models, trauma prevalence varied between age classes 
with distinct patterns for the two taxa (age-by-taxon interaction 
in models 2 and 4; Table 1, Fig. 2b, d and Extended Data Fig. 1b): 
Neanderthals had a significantly higher prevalence of trauma when 
young, whereas Upper Palaeolithic modern humans showed a similar 


Model n Predictor variable Parameter estimates 
Posterior mean Lower 95% Cl Upper 95% Cl Pyicmc 
Model 1 836? Taxon 0.020 —0.889 0.933 0.965 
Model 2 604 Taxon —0.060 —2.017 1.687 0.949 
Sex 1.515 0.178 2.921 0.017** 
Age —0.973 —2.154 0.210 0.100 
Element preservation 0.866 0.232 1.514 0.006*** 
Age x taxon 2.595 0.573 4.645 0.008*** 
Model 3 462° Taxon 0.052 —1.167 1.329 0.940 
Model 4 407¢ Taxon 0.220 —1.934 2.439 0.863 
Age —0.340 —1.553 1.050 0.605 
Element preservation 0.671 0.048 1.376 0.037** 
Age x taxon 2.149 0.048 4.355 0.046** 
Model 5 204° Taxon —0.651 —1.719 0.472 0.231 
Model 6 gg Taxon -0.715 —2.864 1.650 0.522 
Sex 3.533 0.865 6.397 0.002*** 
Age —1.490 —3.454 0.561 0.137 
Specimen preservation 0.882 0.054 1.730 0.032** 
Age x taxon 2.019 —1.190 5.030 0.196 
Model 7 76° Taxon —0.743 —2.443 0.749 0.354 
Model 8 594 Taxon —0.513 —2.902 1.858 0.660 
Age —1.153 —3.333 0.736 0.255 
Specimen preservation 0.739 —0.106 1.623 0.082* 
Age x taxon 1.584 —1.762 4.621 0.320 


Trauma prevalence was modelled using a MCMC algorithm in two model sets with four data subsets each: models 1-4 comprise skeletal elements, models 5-8 comprise cranial specimens. Parameter 
estimates are given as their posterior mean with 95% credible intervals (Cl) and statistical significance (Pucmc; ***P< 0.01, **P<0.05, *P<0.10). See Methods for details. 


*Full dataset. 

Exclusion of sex unknown and age indeterminate elements or specimens. 
°Exclusion of female and sex unknown elements or specimens. 

4Exclusion of female, sex unknown and age indeterminate elements or specimens. 
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prevalence of trauma across age cohorts. Although a similar pattern 
appeared to be present in the specimen-level models (Fig. 3b, d), the 
interaction failed to reach statistical significance. 

The mean model-predicted prevalence of trauma for skeletal ele- 
ments in preservation category 50-75% was between 0.03 and 0.17 
(95% credible interval, 0.0002-0.39) for Neanderthals, and between 
0.02 and 0.12 (95% credible interval, 0.00006-0.35) for Upper 
Palaeolithic modern humans (Fig. 2a—-d). For specimens, predictions 
were calculated for the mean specimen preservation score (a proxy for 
skull completeness and average preservation category of its constituent 
elements; see Methods). These model-predicted trauma prevalence 
values ranged between 0.04 and 0.33 (95% credible interval, 0.000002- 
0.62) for Neanderthals and between 0.02 and 0.34 (95% credible inter- 
val, 0.000001-0.62) for Upper Palaeolithic modern humans (Fig. 3a—d). 

On the basis of our results, we reject the hypothesis that Neanderthals 
exhibit more cranial trauma than Upper Palaeolithic modern humans 
in western Eurasia—rather, we show that the two taxa exhibited a sim- 
ilar overall prevalence of cranial injuries. Previously suggested values 
of 30-40% cranial trauma prevalence for Neanderthals*'° represent 
the very limit of the predictions of our models for Neanderthals (mean 
prevalence of 3-17% for skeletal elements and 4-33% for individual 
specimens); these values are comparable to those found for Upper 
Palaeolithic modern humans (2-12% for skeletal elements and 2-34% 
for individual specimens) and that have been reported for Mesolithic 
hunter-gatherers”’, Neolithic agriculturalists”** and recent hunter- 
gatherers”°. Nevertheless, trauma prevalence derived from skeletal 
remains must not be equated to the actual numbers of injuries that were 
experienced during an individual's lifetime and comparisons of crude 
trauma frequencies should be considered with caution, because the 
methods used for their estimation are not always comparable among 
studies. 

The significant relationship between trauma prevalence and sex in 
both taxa is consistent with observations of greater trauma prevalence 
among males in later periods'®?!***’, generally explained by sex- 
specific differences in activities and behaviours (division of labour, 
initiation rites or violent conflict)'®”°", Trauma prevalence was further 
affected by the preservation state of skeletal remains; more complete 
crania or cranial elements were more likely to have preserved traumatic 
lesions. We therefore caution against quantitative trauma analyses that 
do not address preservation bias. 

Both taxa showed mostly healed traumata and we did not find a 
markedly higher prevalence of trauma among ‘old’ skeletal elements 
in either group. This finding contradicts the expectation that healed 
traumatic injuries accumulate with increasing age as a result of longer 
exposure to dangerous situations”*, given that cranial defects remain 
visible over long-term periods owing to the limited regenerative bridg- 
ing capacity of cranial bone healing”. However, death assemblages 
are likely to deviate from such an expected accumulation pattern””»”, 
because injured individuals—even if they survived their injuries— 
had an increased risk of dying relative to individuals who were never 
injured?}?, Thus, our observed age pattern across taxa is consistent 
with the well-documented increased mortality risk of trauma survivors. 

An interaction between age and taxon in trauma prevalence was 
found by our element-based analysis. For Neanderthals, this result 
suggests that cranial trauma was sustained early in life (before 30 years 
of age) and that trauma survivors were more likely to die while still 
‘young —therefore accumulating in the ‘young’ age cohort in the fossil 
record. Once a trauma is healed, it is not possible to determine when 
it was acquired. Therefore, Upper Palaeolithic modern humans were 
either less likely to sustain trauma than Neanderthals when ‘young’; 
and/or they sustained trauma in a similar frequency when ‘young, but 
‘young’ Upper Palaeolithic modern human trauma survivors had a 
lower mortality risk relative to ‘young’ Neanderthal trauma survivors. 
In other words, ‘young’ Upper Palaeolithic modern human injured 
individuals had a greater probability to survive into the ‘old’ age cohort. 
Possible explanations for these patterns include cultural or individual 
differences in injury proneness and healing, and different long-term 
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consequences of healed trauma, resulting from (for example) differ- 
ences in injury severity or differential treatment of the injured—which 
did not, however, affect the overall prevalence of trauma. 

Our study addresses the controversial topic of trauma prevalence 
in the Palaeolithic by reassessing cranial trauma data using a state-of- 
the-art methodological approach. It is, to our knowledge, the largest 
population-level investigation of Neanderthal cranial trauma to date 
and accounts for differential skeletal preservation and contextual 
explanatory variables using Upper Palaeolithic modern humans as a 
comparative sample. The available evidence indicates similar overall 
trauma prevalence in Neanderthals and Upper Palaeolithic modern 
humans in western Eurasia, rejecting earlier hypotheses of highly 
traumatized Neanderthals. Beyond this overall similarity, our observed 
age-dependent differences between the taxa also suggest possible 
differences in the likely age of trauma acquisition or in the mortality 
risk of trauma survivors. 
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METHODS 


Data collection. We collected data through a comprehensive literature review and 
aimed at gathering a full-evidence dataset comprising all currently known fossil 
crania with and without traumatic lesions. We focused on Eurasian Middle and 
Upper Palaeolithic sites that had yielded skull remains from classic Neanderthals 
(around 80-30 thousand years ago) and early to mid-Upper Palaeolithic modern 
humans (around 35-20 thousand years ago) (Fig. 1; Supplementary Tables 1, 2 
provide information on the studied specimens). We excluded specimens that 
consisted of only dental remains and restricted our sample to adolescent and 
adult specimens with a minimum estimated age-at-death of 12 years’. For each 
specimen we recorded the taxon (Neanderthal or Upper Palaeolithic modern 
human), sex (male, female or unknown), age (young, 12-30 years; old, >30 years; 
or indeterminate, if there was no further estimate published), the skeletal element 
with its preservation status (see ‘Quantification’, and whether the skeletal element 
was affected by trauma (binary). Because trauma prevalence may vary across 
geographical regions owing to differing social or environmental conditions, we 
furthermore recorded the location of each specimen (five geographical regions: 
Iberia, south, central, east, Near East). We adopted the assignments of taxon, sex, 
age and the diagnoses of traumatic lesions as published by the examiners of the 
specimens. These literature-based assignments may be influenced by observer 
bias or by the use of different methods. Nevertheless, we decided in favour of a 
full-evidence approach based on all available published data in order to keep 
data collection as consistent and complete as possible. Moreover, many fossil 
specimens are not available for original examination, precluding a single-method- 
based systematic assessment. We conducted an extensive literature review seeking 
to combine past research with the most recent results, so as to base our data 
on a complete synthesis of all available evidence, representing best-practice of 
research in the field. Notably, we expect misclassifications of traumatic lesions, 
age or sex to be equally likely in Neanderthals and Upper Palaeolithic modern 
humans, and this therefore should not introduce systematic biases into our group 
comparisons. Supplementary Table 3, a catalogue of specimens with described 
traumata, provides detailed descriptions of each lesion as published by the respec- 
tive authors. A case was recorded as (possible) trauma once an author expressed 
confidence that a lesion represents a trauma, or considered a traumatic origin 
to be an alternative explanation for an observed lesion. No statistical methods 
were used to predetermine sample size. The investigators were not blinded to 
allocation during analyses. 

Quantification. Skeletal preservation has a direct effect on the census of trauma 
prevalence, because an injury is more likely to be detected on a more complete 
bone*4. In chronologically older fragments, the preservation of skeletal remains 
commonly deteriorates and fragmentation of both single bones and associated 
skeletons increases. Moreover, the assignment of fragmented and commingled 
remains to specific individuals is often impossible or insecure. To account for 
differential skeletal preservation among sites and specimens, and to remove 
bias between geologically older Neanderthals and younger Upper Palaeolithic 
modern humans, we quantified the preservation status for each of the 14 major 
skull bones, that is, skeletal elements, separately. These are the frontal and occip- 
ital bones, as well as the left and right elements of the parietal, temporal, maxilla, 
mandible, zygomatic and nasal bones. Except for the zygomatic and nasal bones, 
we rated the completeness of skeletal elements in four preservation categories: up 
to 25%, 25-50%, 50-75% and 75-100%. Owing to their small size, the left and right 
zygomatic and the nasal bones were rated in just two categories: up to 50% and 
50-100%. We performed the quantification procedure by visually judging the pre- 
served portion of a given skeletal element in comparison to its complete equivalent 
using published pictures, sketches, virtual representations and verbal anatomical 
descriptions. Skeletal elements for which the preservation could not be quantified 
were excluded from the sample. In total, we collected data on 836 skeletal elements 
from 204 specimens. The quantification revealed a differential preservation among 
skeletal remains of Neanderthals and Upper Palaeolithic modern humans, with 
Neanderthals being biased towards incompletely preserved skeletal remains (see 
Extended Data Fig. 2a-e). 

Statistical methods. We predicted trauma prevalence using GLMMs. To obtain 
robust GLMM estimates despite a large proportion of trauma absences (zeros) in 
our dataset, we used a Markov chain Monte Carlo (MCMC) algorithm as imple- 
mented in the MCMCglmm package* for R version 3.4.3°°. Trauma presence or 
absence was modelled as a binary response variable with a binomial error distri- 
bution using a logit-link function. 

Our statistical analysis of trauma prevalence comprised two sets of four 
GLMMs on subsets of the raw data. The first set (models 1-4) followed a skeletal 
element-level approach, whereas the second set (models 5-8) represented an indi- 
vidual specimen-level approach. 

Element-level models (models 1-4). We entered the two-level predictors taxon 
(Neanderthals or Upper Palaeolithic modern humans), age (young or old, with 
30 years as the cut-off) and sex (male or female), as well as the z-transformed 
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four-level covariate element preservation (0.25, 0.5, 0.75 and 1) as fixed predictor 
variables. Visual data inspection indicated a potential for variation in the taxon 
effect with age class but not with sex, so we added the age-by-taxon interaction 
to the models. 

Because traumata are not equally frequent in the different cranial regions 
we added intercepts for skeletal element as a random component for all 
element-level models, enabling us to derive marginal predictions for trauma 
prevalence beyond element identity while statistically accounting for variation 
in trauma prevalence between skeletal elements. Moreover, given that trauma 
prevalence may vary regionally, we added location as a second random intercept 
to the models. 

We ran four separate models to assess trauma prevalence using four data subsets 

and different explanatory variable combinations, while maintaining the same two 
random components in each case. Model 1 included taxon as the only fixed predictor. 
The exclusion of the other, incompletely scored, contextual predictor variables 
enabled us to analyse the full dataset of n = 836 skeletal elements. Model 2 included 
all fixed predictors, that is, taxon, age, sex, element preservation and the age- 
by-taxon interaction. We excluded all sex unknown and age indeterminate skeletal 
elements from model 2, resulting in a reduced sample of n = 604. Given a preva- 
lence of trauma in male individuals (see Fig. 2), we reproduced these two model 
variants using a male-restricted data subset. In model 3 (n= 462), we exclusively 
tested for taxon differences, excluding female and sex unknown skeletal elements. 
Model 4 (n= 407) included the predictors taxon, age, element preservation and the 
age-by-taxon interaction. We excluded female, sex unknown and age indeterminate 
skeletal elements from this model. 
Specimen-level models (models 5-8). As a complementary conservative 
approach, we repeated the above analyses on the specimen level. This overcomes 
potential pseudo-replication of trauma incidence when lesions extend over mul- 
tiple skeletal elements of the same cranium, or a single cranium exhibits several 
lesions, but does not take variation in trauma incidences between skeletal elements 
into account. 

Specimen-level models 5-8 were identical to the element-based models 1-4, 
respectively, as described above. Cranial trauma presence or absence, however, was 
here scored at the level of specimens, resulting in sample sizes of n = 204 in model 
5, n= 89 in model 6, n=76 in model 7 and n=59 in model 8. The preservation 
score in these models (specimen preservation) is a combined proxy of skull com- 
pleteness and its average preservation category, calculated as the sum of all available 
element-based preservation scores divided by 14 skeletal elements. Location was 
added as the only random intercept in models 5-8. 

As suggested for binary response variables*®, we fixed the residual before 1 
and chose an inverse Gamma prior for random effects**. Model parameters were 
chosen to maximize model fit, visible with (i) an autocorrelation value** between 
posterior parameter estimates <0.05; (ii) parameter estimates reaching conver- 
sion between four independent model chains“ as reflected in the potential scale 
reduction factor <1.01; and (iii) observed trauma prevalence falling within the 
95% highest posterior density intervals of their respective posterior distribution. 
These criteria were met after 5,100,000 MCMC iterations, a burn-in of 100,000, 
and a thinning interval of 1,000, resulting in approximately 5,000 samples in all 
posterior distributions. From these posterior distributions, we derived the highest 
posterior density intervals (or credible intervals) for each parameter estimate and 
denoted them statistically significant (***P < 0.01, **P < 0.05) or statistical trend 
(#P< 0.10) when not including zero. These intervals formed the basis for statis- 
tical inference and hypothesis testing. Plots in Fig. 2 show model predictions for 
element-preservation category 50-75%, plots in Fig. 3 show the predicted trauma 
prevalence for specimens at their mean preservation score. In both cases, pre- 
dictions linearly scale with the other preservation categories, generating overall 
slightly larger or smaller values but no change in the effect pattern for taxon, sex, 
age and the age-by-taxon interaction. 

Reporting summary. Further information on research design is available in 
the Nature Research Reporting Summary linked to this paper. 

Code availability. The R code used to analyse the data in this study is available 
upon request from the corresponding author. 
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Data availability 

Specimen-level data that support the findings of this study are provided in 
Supplementary Tables 1, 2. Quantification data for skeletal elements are available 
from the corresponding author upon reasonable request. Source Data for Figs. 2, 3 
and Extended Data Figs. 1, 2 are provided in the online version of the paper. 
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each skeletal element for Neanderthals (b; full dataset, n = 295 skeletal 


© 2018 Springer Nature Limited. All rights reserved. 


LETTER 


https://doi.org/10.1038/s41586-018-0570-8 


Trans- differentiation of outer hair cells into inner 
hair cells in the absence of INSM1 


Teerawat Wiwatpanit!?, Sarah M. Lorenzen!*°, Jorge A. Cantt!®!°, Chuan Zhi Foo!’, Ann K. Hogan!?, Freddie Marquez!, 
John C. Clancy!, Matthew J. Schipma‘*, Mary Ann Cheatham®*, Anne Duggan! * & Jaime Garcia-Afioveros>®7:81# 


The mammalian cochlea contains two types of mechanosensory hair 
cell that have different and critical functions in hearing. Inner hair 
cells (IHCs), which have an elaborate presynaptic apparatus, signal 
to cochlear neurons and communicate sound information to the 
brain. Outer hair cells (OHCs) mechanically amplify sound-induced 
vibrations, providing enhanced sensitivity to sound and sharp 
tuning. Cochlear hair cells are solely generated during development, 
and hair cell death—most often of OHCs—is the most common 
cause of deafness. OHCs and IHCs, together with supporting cells, 
originate in embryos from the prosensory region of the otocyst, but 
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how hair cells differentiate into two different types is unknown)’. 
Here we show that Insm1, which encodes a zinc finger protein that 
is transiently expressed in nascent OHCs, consolidates their fate 
by preventing trans-differentiation into IHCs. In the absence of 
INSM1, many hair cells that are born as OHCs switch fates to become 
mature IHCs. To identify the genetic mechanisms by which Insm1 
operates, we compared the transcriptomes of immature IHCs and 
OHCs, and of OHCs with and without INSM1. In OHCs that lack 
INSM1, a set of genes is upregulated, most of which are normally 
preferentially expressed by IHCs. The homeotic cell transformation 
of OHCs without INSM1 into IHCs reveals a mechanism by which 
these neighbouring mechanosensory cells begin to differ: INSM1 
represses a core set of early IHC-enriched genes in embryonic OHCs 
and makes them unresponsive to an IHC-inducing gradient, so that 
they proceed to mature as OHCs. Without INSM1, some of the OHCs 
in which these few IHC-enriched transcripts are upregulated trans- 
differentiate into IHCs, identifying candidate genes for IHC-specific 
differentiation. 


Fig. 1 | Conditional ablation of Insm1 in hair cells results in IHC-like 
cells in place of OHCs. a, Wild-type (WT), floxed (F) and conditionally 
deleted (cKO) alleles of Insm1. Co-expression with Cre recombinase 
generates an Insm1 knockout allele lacking its coding sequence (CDS) 
and 3’ untranslated region (UTR), leaving only part of the 5’/UTR. Purple 
triangles, LoxP sites; red triangles, Frt sites. b-d, Hearing tests (b, c, mean 
with all values; d, mean +s.e.m.). ABR thresholds (b), DPOAE thresholds (c), 
and iso-input functions (d) for the DPOAE of Atoh1©*;Insm1** mice 

at P25-31 (black; n =3 males and 2 females) and control littermates 

(red; n=4 Insm1* and 4 Atoh1°"!+;Insm1*'*; 6 males and 2 females). 
Up arrow (c) indicates that maximum sound was insufficient to reach 
threshold. e-1, Immunohistochemistry in organs of Corti revealed that 
the Atoh1°'*;Insm1"* mice had normal IHCs but many OHCs expressed 
calmodulin and not oncomodulin (e, f, h), had stereocilliary bundles 
resembling those of IHCs (f-actin labelled with phalloidin; e, f), expressed 
VGLUTS3 instead of prestin (g; asterisk, one rare cell expressed both), 

had cell shapes (h) and large nuclei like IHCs (i, j, asterisks). j, Number of 
nuclei measured for each cell type are indicated under each data set. 

k, oc-IHCs had nuclear CtBP2 and a number of presynaptic ribbons 
(ribeye) approaching that of IHCs. I, In Atoh1“'+ ;Insm1** mice, 
oc-IHCs are more frequent in the first row of OHCs (closer to the IHCs) 
than the second or third rows. Data from two males, three females 

and seven undetermined (936 cells). m, n, Similar distribution of 
oc-IHCs in TgPax2!*Insm1f* mice. n = 3 mice (234 cells). 

j, 1, n, Mean +s.e.m., one-tailed Student’s t-tests, 1 = number of mice. 
Images and quantifications from mid-cochlear positions at P34 (e, g, i), 
P46 (f, h), P21, P23 and P46 (k), and P14 and P15 (m). Controls were 
Insm1*" (e, g, i, m) or Atoh1“;Insm1*"* (f, j, k) littermates. Scale 

bars, 10j1m. Biological replicates were used for all experiments and similar 
results were obtained from three or more mice per genotype. 
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Fig. 2 | Trans-differentiation of embryonic OHCs into embryonic IHCs 
in the absence of INSM1. a, GFP on embryonic organs of Corti indicates 
that the Insm1 promoter (expressing GFP.Cre) was active in oc-HCs 
(OHCs) at E16.5 and E18.5, regardless of the presence (Insm1SFPCre!+) 

or absence (Insm1°?“/-) of INSM1. b, ¢, Immunohistochemistry at 
E16.5-P2. While Bcl11b, an immature OHC-specific transcription factor, 
was found in all oc-HCs from control Insm1@?e'+ and Insm1OhPCre/- 
mice at E16.5 (b), its expression was diminished or undetectable 
(asterisks) in about half of oc-HCs in mice lacking INSM1 at E18.5 (b) 
and P2 (c). d, In situ hybridization revealed that in the absence of INSM1, 
a subset of oc-HCs began to express Fgf8 weakly as early as E16.5 (32.5%, 
110/338 OHCs from n=3 TgPax2©"*;Insm1™ or Atoh1!*;Insm1"" 
mice) and strongly by E17.5-E18.5 (40%, 52/130 OHCs from n= 2 


OHCs express Insm1 transiently from the onset of differentiation 
(embryonic day 15.5 (E15.5)) to approximately postnatal day 2 (P2)*. 
Neuronal progenitors and nascent spiral ganglion neurons also express 
Insm1*. Because mice in which Insm1 is completely knocked out die 
embryonically by E19.5°°, we generated an allele (Insm1") in which the 
entire coding sequence can be deleted (Fig. 1a, Extended Data Fig. 1). 
We conditionally ablated Insm1" with Atoh1°", expressed from E13.5 
(three days before Insm1) and recombining in most cochlear hair cells 
and some supporting cells, but not in spiral ganglion neurons’. We 
also ablated Insm1* with TgPax2@, expressed earlier in the otocyst 
and recombining in most inner ear cells®. In these mice, Insm1 was 
ablated before its expression in OHCs (Extended Data Fig. 2). Both 
Atoh1°'*+-Insm1*" and TgPax20 + Insm1/? (cKO) mice displayed 
alterations in auditory brainstem response (ABR) thresholds that can 
be accounted for by shifts in distortion product otoacoustic emissions 
(DPOAEs), a characteristic of OHC dysfunction (Fig. 1b-d, Extended 
Data Fig. 3a, b). In the organs of Corti of these mice, many cells in 
the positions of OHCs (the outer compartment) had features of IHCs. 
They had large stereocilia, like IHCs, and not the shorter, W-arranged 
stereocillia of OHCs (Fig. le, f); expressed the IHC-enriched calcium 
buffer calmodulin and lacked OHC-specific oncomodulin (Fig. le, f, h, 
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Insm1P Cre oy Atoh1“!*;Insm1** mice) and PO-P4 (46.6%, 61/131 
OHCs from n= 3 TgPax2“!*;Insm1** or Atoh1@'*;Insm 1!" 

mice). e, Immunohistochemistry at PO revealed that although in 
Atoh1©'*+;Insm1*" mice there was no HC loss and OHCs retained their 
characteristic inclination, their cell bodies were disorganized at the 
nuclear level. f, All OHCs at the bases of cochleae from Insm1‘" mice 
had neuroplastin in stereocilia (visualized with phalloidin). However, 
some OHCs at the base of cochleae from TgPax2©! + Insm [*10x/Flox mice 
lacked neuroplastin. g, Conversely, the oc-HCs that lacked neuroplastin 
expressed VGLUT3. Images are from mid (a-e) or basal (f, g) cochleae. 
Scale bars, 101m. Biological replicates were used for all experiments and 
similar results obtained from three or more mice per genotype. 


Extended Data Fig. 3f); expressed the vesicular glutamate transporter 3 
(VGLUT3), which is required for IHC presynaptic function, and lacked 
prestin, which is required for OHC electromotility (Fig. 1g, m); had the 
flask shape of IHCs rather than the cylindrical shape of OHCs; and had 
large nuclei, like IHCs, instead of the smaller nuclei of OHCs (Fig. 1i, j, 
Extended Data Fig. 3j). These nuclei harboured the transcription 
factor CtBP2, which is normally expressed in IHCs (Fig. 1k), and the 
cells contained a number of presynaptic ribbon synapses (10.6 +2.1 
(mean +s.d.), 1 =3 mice, 39 cells) closer to that found in control IHCs 
(16.3 £0.7, n =3 littermate controls, 30 cells), instead of the smaller 
number found in OHCs (1.8 +0.2, n=3 mice, 90 cells) (Fig. 1k). With 
rare exceptions (Fig. 1g), these abnormal cells displayed all IHC fea- 
tures examined and lacked those of OHCs, so we termed them oc-IHCs 
(outer compartment IHCs). 

The proportion of oc-IHCs in Atoh1!+;Insm1*/* mice 
(42.6 + 10.9%, n=12 mice) and TgPax2©!+; Insm1F/* mice 
(46.0 + 5.64%, n=3 mice) was about half, the rest appearing as OHCs. 
This is not due to incomplete or delayed ablation of Insm1, because we 
did not detect Insm1 mRNA in any OHCs of TgPax2°"*;Insm1*” mice 
during or after the onset of expression (E16.5; Extended Data Fig. 2a, 
b, bottom). Notably, the oc-IHCs were more prevalent in the first hair 
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Fig. 3 | Genes preferentially expressed in immature IHCs or OHCs. 
a-d, Strategy for collecting separate pools of perinatal (PO) OHCs and IHCs 
by FACS. a, Crosses for generating pups in which live IHC and OHCs can 
be fluorescently distinguished (green, GFP; red, tdTomato). b, Insm1SFPCel+, 
Atoh1414FP+;: R26 RidTomatol+ mice express GFP in IHCs and GFP plus 
tdTomato in OHCs. Neurons express tdTomato but GFP from Insm1¢!PC" 
has subsided. c, FACS separating IHCs (green) from OHCs (yellow) and 
neurons (red), done on six separate pools of IHCs and OHCs. d, RT-qPCR 
for IHC-specific Fgf8 and OHC-specific Insm1 confirm that these pools of 
cells are enriched for IHCs or OHCs, respectively. e, Logarithmic plot of 
genes preferentially expressed in either IHCs or OHCs based on RNA-seq 
values. Blue arrows indicate genes previously known to be HC subtype- 
specific in neonates*'”*>*°, We additionally confirmed perinatal OHC- or 
IHC-specific expression of Insm2 with a knock-in reporter line (E17.5; top 


cell row of the outer compartment than in the second or third rows 
(Fig. 11, n). In principle, these oc-IHCs in mature organs of Corti lack- 
ing INSM1 could be displaced IHCs, newly generated IHCs replacing 
lost OHCs, IHCs born in the outer compartment, or OHCs that had 
trans-differentiated into IHCs. They were not displaced IHCs, as the 
THC row in both cKO mice had a normal arrangement and density of 
IHCs (Extended Data Figs. 3c, 4a). Although during normal devel- 
opment cochlear hair cells are all born during embryogenesis (E12- 
E16)*’°, early hair cell death can trigger the generation of hair cells 
from proliferating and trans-differentiating supporting cells in the first 
few days after birth!!-), This does not occur in the absence of INSM1, 
as hair cell density in the outer compartment (OHCs + oc-IHCs) was 
unaltered up to P34 (Extended Data Figs. 3d, e, 4b, c), whereas oc-IHCs 
were present well before that. Second, hair cells that are derived post- 
natally from supporting cells initially express SOX2'?""4, whereas none 
of the oc-IHCs of cKO pups expressed SOX2 (Extended Data Fig. 4d). 
Third, some postnatally produced HCs result from proliferation of sup- 
porting cells'!“'4, but none of the oc-IHCs in cKO mice derived from 
postnatal proliferation (Extended Data Fig. 4e). These results show that 
oc-IHCs do not result from OHC death followed by replacement from 
displaced IHCs or postnatally generated hair cells. Instead, the oc-IHCs 
represent homeotic transformation (of mechanosensory OHCs into 
IHCs) due to a developmental defect in the generation or differentia- 
tion of OHCs. Either IHCs are generated in place of OHCs, or OHCs 
trans-differentiate into IHCs. 

We examined organs of Corti from mice with conditional (TgPax2 
Insm1*" and Atoh1@@'+;Insm1*) or complete (Insm1°F?C*-) Insm1 
knockout during late embryogenesis, when OHCs and IHCs begin to 
differentiate. At E16.5, all cells in the outer compartment begin to express 
the earliest markers of OHCs: the Insm1 promoter in Insm1°FPCre/- 


Cre/+, 
3 


116 (Otof, Sict7a8) 17 (Sic26a5) 


inset). f, Comparison of fold difference in mRNA expression determined 
(for genes indicated as black boxes in e) by RNA-seq and RT-qPCR. 
Genes that encode transcription factors are underlined. IHC-specific 
genes potentially inhibited by INSM1 in embryonic OHCs are in blue. 

g, Venn diagrams indicating the number of genes enriched in either IHCs 
or OHCs of neonates versus adults (estimated from published results*’). 
Representative genes are shown in parentheses. Although neonatal HCs 
of either type begin to show expression of some functional markers 
characteristic of mature cells (Otof and Slc17a8 (also known as Vglut3) in 
IHCs and Slc26a5 (also known as Prestin) in OHCs), the majority of hair 
cell type-specific genes at this early stage differ from those of the mature 
cells. Biological replicates were used for all experiments and similar results 
obtained from three or more mice per genotype. 


embryos (which lack INSM1 but express GFP from the Insm1 pro- 
moter’; Fig. 2a), and BCL11B in nuclei (Fig. 2b, c). Whereas in control 
mice, BCL11B expression was maintained past birth, in embryos lack- 
ing INSM1 it subsided in nearly half of outer compartment hair cells 
(oc-HCs) (from E18.5 to P2; Fig. 2b, c). During the same period, many 
oc-HCs express the early IHC marker fg/8 (Fig. 2d). Around birth, two 
additional markers begin to be expressed in control mice: neuroplastin, 
preferentially in OHC stereocillia!®, and VGLUT3 in IHCs!”. By com- 
parison, in both cKO mice, many oc-HCs expressed VGLUT3 and not 
neuroplastin (Fig. 2f, g). Finally, although the orientation of IHCs and 
OHCs is maintained at birth in cKO mice, the disorganization of the 
OHC rows at the level of the nuclei already revealed alterations in cell 
shape (Fig. 2e). We conclude that in the absence of INSM1, oc-HCs 
are generated with early OHC features, but soon thereafter some of 
these cells lose these features, express early IHC markers, and proceed 
to differentiate into mature IHCs. This trans-differentiation of early 
OHCs into IHCs reveals that INSM1 is not required to initiate com- 
mitment to the OHC fate, but acts subsequently by preventing it from 
switching to that of IHCs. Insm1 acts by consolidating the OHC fate, 
making it permanent. 

Brief expression of Insm1 is sufficient to evade phenotypic con- 
version (Extended Data Fig. 5). It appears that Insm1 locks the OHC 
fate during a narrow developmental period. Curiously, although KO 
OHCs completely lack Insm1 from their birth, fewer than half of 
these cells trans-differentiate into IHCs. This trans-differentiation 
in TgPax20!*;Insm1*/F and Atoh1°'*+;Insm1*/* mice is more fre- 
quent in hair cell rows closer to the IHCs than in those further away 
(Fig. 11, n). This distribution reveals the existence of a gradient in the 
neural to abneural axis of the organ of Corti that regulates cochlear 
hair cell types. This gradient might induce IHC differentiation, and 
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Fig. 4 | Insm1 prevents expression of a subset of immature IHC-specific 
genes in embryonic OHCs. a, Identification of genes potentially regulated 
by INSM1 in embryonic to perinatal OHCs. Plot of average expression levels 
determined by RNA-seq from E18.5 OHCs expressing (Insm1°°?"!*) or 
lacking (Insm1°""/-) INSM1 (n=3 pools of OHCs per genotype, each 
from 8-12 mice). Undetected transcripts are assigned an expression of 1071. 
We established as cutoff either an FDR-adjusted P < 0.05 (red squares) or, 
for a less stringent selection, a raw P< 0.01 (green squares). Blue diamonds 
represent all other transcripts. b-e, Venn diagrams indicating overlap 
between IHC- or OHC-enriched genes with genes that are presumably 
regulated by INSM1 in OHCs. Upregulated and downregulated genes are 
those overexpressed in OHCs with or without INSM1, respectively. The 
expected number of genes that would appear by random coincidence is in 
parentheses. Only e shows a larger overlap than randomly expected, pointing 
to 36 IHC-specific genes that appear to be downregulated by INSM1 in 
OHCs. f, Plot of the differential expression of genes in OHCs versus IHCs 
and OHCs with INSM versus OHCs without INSM1. Genes are plotted as 
dots with colours corresponding to the P value criteria used in a. Expression 
levels in OHCs with and without INSM1 (from which KO/WT changes are 
estimated) are average RNA-seq values of three pools of OHCs per genotype. 
Expression levels in IHCs and OHCs (from which IHCs/OHCs changes are 
estimated) are average RNA-seq values of six pools of each cell type. Blue 
dots along the x-axis are examples of the many genes enriched in either IHCs 
or OHCs that are not affected by INSM1. Differentially expressed genes 
confirmed by RT-qPCR are labelled in purple. Each gene is upregulated to 

a similar extent in IHCs (versus OHCs) as in OHCs lacking INSM1 (versus 
IHCs with INSM1). g, Graphic interpretation. Darker shades of blue indicate 
higher expression. h-o, ISH confirms preferential expression in IHCs (and 
often other cells of the organ of Corti) compared with OHCs, and increased 
expression in OHCs lacking INSM1. Scale bars, 10j1m. Biological replicates 
were used for all experiments and similar results obtained from three or 
more mice per genotype. 
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INSM1 could act by preventing embryonic OHCs from responding 
to it. 

In other developing cell types, INSM1 functions as a transcriptional 
activator or repressor'**, We hypothesized that INSM1 directs OHCs 
to develop differentially from IHCs by activating OHC-specific genes 
or inhibiting IHC-specific genes. We first determined which genes 
were expressed in either differentiating hair cell type when Insm1 was 
expressed (Fig. 3), and then searched for genes regulated by INSM1 in 
developing OHCs (Fig. 4). For both approaches we used Insm1@PC", 
in which the coding sequence of Insm1 is replaced by that of a fusion 
protein between GFP and the Cre recombinase, thereby serving 
as a reporter as well as a null allele*”?. We generated Insm1CFPC"/+; 
AtohI4!GFP/+ RI6R dTomatol+ mice, in which all hair cells express GFP 
(starting at E13.5 from Atoh 14! and, in OHCs, from Insm1¢F?C") 
but only OHCs also express tdTomato following Insm1°""C” expression 
(throughout the cochlea by E18.5%; Fig. 3a, b). We used these mice to 
sort OHCs and IHCs from neonatal (PO, approximately E19.5) organs 
of Corti (Fig. 3c). Using fluorescence-activated cell sorting (FACS), 
we collected pools of RNA from IHCs and OHCs (Fig. 3d), and then 
used RNA sequencing (RNA-seq) to obtain their transcriptomes 
(Supplementary Table 1). We thus identified 922 IHC-enriched genes 
and 676 OHC-enriched genes (Fig. 3e, Supplementary Tables 1-3). 
Among these were the 12 genes previously shown to be expressed 
preferentially in early IHCs or OHCs*!”5-”? (Fig. 3e), indicating that 
our approach detects most differentially expressed genes. One con- 
cern was whether genes that showed small differences in expression 
(twofold or less; for example, Zmat3), or that were detected at very 
low levels in one cell type only (for example, Sox 18 and Msx1), were 
truly differentially expressed. We selected 21 transcripts (Fig. 3e), and 
used quantitative PCR with reverse transcription (RI-qPCR) to test for 
differential expression using additional pools of RNA from IHCs and 
OHCs. All 21 genes were confirmed to be differentially expressed, and 
the differences in expression were similar whether estimated by RNA- 
seq or RT-qPCR (Fig. 3f). We also confirmed differential expression of 
additional genes by methods not susceptible to potential artefacts of cell 
sorting and mRNA extraction: Bcl11b in OHCs by immunohistochem- 
istry (Fig. 2b); Insm2 in OHCs using an Insm2!" mouse line (Fig. 3e 
inset); and other genes by in situ hybridization (ISH) as preferentially 
expressed in OHCs (Neurod6, Sez6l) or IHCs (Tbx2, Id4, Rprm, Smad3, 
Car13, Brip1, Lrrn1, Pink1) (Fig. 4g-o). These results attest to the low 
prevalence of false positives among the genes we estimated as being 
differentially expressed between immature IHCs and OHCs. 

The transcriptomes of perinatal cochlear hair cells and supporting 
cells have been obtained, but these included a mixture of both OHCs 
and IHCs**”*. Although cell-type specific transcriptomes of mature 
IHCs and OHCs, obtained using microarrays, have also been pub- 
lished*°, we have obtained transcriptomes of these cell types before 
maturity, during early differentiation. A comparison of genes expressed 
in differentiating and mature IHCs and OHCs reveals very little overlap 
(Fig. 3g and Supplementary Table 4): only 5.9% of IHC-enriched and 
2% of OHC-enriched genes are differentially expressed at differentiat- 
ing and mature stages. These include some genes that are characteristic 
of the mature stage (Vglut3 and Otof in IHCs and Prestin in OHCs) 
but whose expression is incipient at birth. However, the vast majority 
of genes that were preferentially expressed in either cell type during 
differentiation (such as Insm1, Insm2 and Bcl11b in OHCs, and Brip1, 
Car13 and Fgf8 in IHCs) are not expressed upon maturation and vice 
versa. Thus a complex transcriptome, involving hundreds of genes, is 
transiently active during OHC- and IHC-specific differentiation. It is 
in this genetic context that INSM1 locks the fate of OHCs so that they 
proceed to differentiate into mature OHCs and not IHCs. 

To investigate how INSM1 drives OHC differentiation, we used RNA- 
seq to look for genes that were differentially expressed in differentiating 
OHCs with and without INSM1 (Insm1°F? r+ versus Insm1 SFPCr-) 
(Fig. 4a, Extended Data Fig. 6, Supplementary Table 5). We identified 
between 31 and 331 genes that could be differentially expressed (either 
upregulated or downregulated) by INSM1 (Supplementary Tables 6, 7). 
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Comparison of these genes with those normally enriched in OHCs or 
IHCs (Fig. 4b-e), combined with RT-qPCR retesting (Supplementary 
Table 8) and ISH (Fig. 4f-o), showed that, in OHCs, INSM1 does not 
activate OHC genes but rather inhibits IHC genes. No upregulated 
genes were confirmed by RT-qPCR and, of the 22 downregulated 
genes confirmed, 21 are normally preferentially expressed by IHCs. The 
enrichment of these genes in wild-type IHCs is similar to their upregu- 
lation in OHCs lacking INSM1 (Fig. 4f, g, Extended Data Table 1). By 
contrast, most genes that are differentially expressed in OHCs versus 
IHCs were not affected by INSM1. We conclude that INSM1 down- 
regulates a specific subset of IHC-enriched genes in embryonic OHCs; 
without INSM1, those genes are expressed in embryonic OHCs, nearly 
half of which transdifferentiate into IHCs. 

At E18.5, OHCs lacking INSM1 have not upregulated most of the 
early IHC-specific genes and still express early OHC-specific genes 
(Fig. 4f), even though many of these cells will, once differentiated, 
express all examined features and markers of IHCs and none of OHCs 
(Fig. le-k, m, Extended Data Fig. 3f). The small number of early IHC- 
specific genes (21 of 922, about 2%) that were upregulated in embry- 
onic OHCs lacking INSM1 are likely to represent an early step in the 
genetic cascade that leads to their complete transformation into IHCs. 
As oc-HCs expressing these few genes differentiate into IHCs, these 
genes are likely to be required for IHC differentiation. Hence, in addi- 
tion to identifying Insm1 as a critical gene for OHC differentiation, 
our results also identify candidate genes for regulating the specific 
differentiation of IHCs. Because all OHCs express Insm1, but in its 
absence fewer than half trans-differentiate into IHCs, we expected two 
patterns of misexpression by ISH (Fig. 4g—o). Some genes (Rprm, Id4, 
Lrrn1, Car13, Pink1 and Brip1; Fig. 4h-n) were upregulated in all OHCs 
lacking INSM1, as expected if they were repressed by INSM1. These 
must include the genes whose disinhibition in the absence of INSM1 
renders embryonic OHCs susceptible to the gradient that induces IHC 
trans-differentiation. Other genes (Fgf8 and Tbx2; Figs. 2d, 40) were 
upregulated only in fewer than half of oc- HCs—presumably those 
that would trans-differentiate into IHCs. These genes are some of the 
earliest expressed in IHCs, and are likely to include regulators of IHC 
differentiation. 

Our results reveal homeotic transformation of OHCs into IHCs in 
the absence of INSM1, identify the genes initially misregulated by abla- 
tion of Insm1, and provide a genetic mechanism for differentiation of 
these two cell types: nascent OHCs transiently express Insm1, which 
represses (directly or indirectly) a core set of early IHC-specific genes 
and renders the cells insensitive to an IHC-inducing gradient; this con- 
solidates the fate of OHCs by preventing their trans-differentiation into 
THCs. 
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METHODS 

Ethics and animals. All animal care and procedures were in strict accordance with 
the Guide for the Care and Use of Laboratory Animals published by the National 
Institutes of Health and were approved by Northwestern University’s Institutional 
Animal Care and Use Committee (Animal Study Protocols 1500001281 and 
1800000593). Mus musculus were used in this study. Animals were of CD1 and 
C57BL/6 genetic background. Similar numbers of male and female animals were 
used for all analyses, and these are stated in the manuscript. When sex was unde- 
termined, as in the case of embryos, all the embryos were blindly selected. For 
embryonic samples, we used animals at E16.5, E17.5 and E18.5. For neonatal ani- 
mals, the average age was 9.27 + 5.6 (s.d.) days old. For adult animals, the average 
age was 34.32 + 6.3 days old. All tests were performed in mutant and littermate 
control mice, which were otherwise randomly chosen from the litters. Although 
the phenotype is obvious, quantifications of nuclear size, ribbon synapse densities, 
cell densities (Fig. 1, Extended Data Figs. 3, 4) and hearing tests (Fig. 1, Extended 
Data Figs. 3, 5) were performed without knowledge of genotype. Transcriptomic 
analyses were carried out by a biostatistician from a core facility with no knowledge 
of genotype. 

Generation of the Insm1 floxed allele for conditional ablation. The Insm1 target- 
ing construct was generated using a genomic BAC clone, 439G2, from the mouse 
129/SvEv genomic BAC library, RPCI-22. The Insm1 gene, including the coding 
sequence, 5’ and 3’ UTRs, a 2,790-bp 5’ homologous sequence and a 4,098-bp 3/ 
homologous sequence, was subcloned into the pL253 vector (IA1-pL253) using 
recombineering, as described previously*!. The recombined clone, I[A1-pL253, 
was further modified using recombineering to add a LoxP recombination site 
immediately downstream of the 5/UTR but before the Kozak sequence. A second 
Frt-NEO-Frt-LoxP site was recombined immediately downstream of the 3/UTR. 
The completed targeting vector was sequence verified and sent to the Northwestern 
Transgenic and Targeted Mutagenesis Laboratory (Chicago, IL) for electroporation 
into SvEv 129 mouse embryonic stem cells. 

Using Q5 High-Fidelity Polymerase with GC Enhancer (NEB Catalog:M0491) 
and the primers WT: TCTTAGATTCTGCCCTTTCTGACAG; CKO: 
CCAAGGAGATGACCACGCATAG; and R2: CTCTTGTAGGGCCTCCTGTG, 
we performed a PCR to identify recombinant clones. Conditions for Thermal- 
cycler were: step 1: 98°C, 3:00 min; step 2: 98°C, 0:10 min; step 3: 65°C, 0:30 min, 
-1°C per cycle; step 4: 72°C, 6:45 min; repeat step 2, 10x; step 5: 98°C, 0:10 min; 
step 6: 60°C, 0:30 min; step 7: 72°C, 6:45 min, repeat step 5, 25 x; step 8: 72°C, 
10:00 min; step 9: 4°C. Expected sizes for wild-type allele using primers WT to R2 
was 6,163 bp. Expected size for recombinant clones using CKO-Reverse was 6,145 
bp. We screened a total of 439 clones and identified 5 recombinants. 

We further screened these five ES cell clones for recombination upstream of 
the 5’ LoxP site. DNA from selected recombinant clones was digested with the 
restriction enzyme Spel (NEB Cat:R0133) and homologous recombination was 
confirmed by Southern hybridization. DNA was visualized using a 168-bp radio- 
labelled probe (as described®). The expected band sizes for wild-type and condi- 
tional knockout alleles are 18,162 bp and 14,333 bp, respectively. All five ES clones 
contained a targeted allele of Insm1. 

These clones were used for the generation of mosaic embryos, which were 
implanted into surrogate mothers by the Northwestern Transgenic and Targeted 
Mutagenesis Laboratory (Chicago, IL). From one of these clones (B3) we first gen- 
erated chimeric mice, which were mated to mice expressing the FIpE recombinase 
(B6-Tg(CAG-FLPe)36*” to delete the NEO cassette flanked by Frt sites and thus 
generate mice with a floxed allele of Insm1. 

Hearing tests. During testing, mice of both sexes aged P25-P31 were anaesthetized 
with ketamine and xylazine (120 mg/kg and 10 mg/kg, respectively, intraperitoneal 
(IP)) and their body temperature maintained using a heating blanket. In order to 
assay OHC function, DPOAEs were recorded using a custom probe equipped with 
a sensitive microphone (Knowles Electronics, FG-3652-CX). Responses were ana- 
lysed using Emission Averager (EMAV)**. Because the probe can be placed close 
to the eardrum, sound calibrations in the ear canal of each mouse were performed 
out to 48 kHz using a chirp stimulus generated in System Response (SysRes)**. All 
signals were generated using a CardDeluxe 24-bit sound card with a sampling rate 
of 96 kHz. Iso-input functions (f2/fl = 1.2) at L1=50 and L2=35 dB were recorded 
for f2 frequencies between 2 and 47 kHz, thereby covering most of the mouse 
audiogram. Input-output functions were also acquired for various f2 frequencies 
(6, 12, 27 kHz), where L1 = L2+10 dB. Thresholds for 2f1-2 were then calculated 
and represent the level of f1 that produces a DPOAE of 0 dB. After emission testing, 
neural responses were measured by collecting ABRs using tone-burst stimuli. The 
threshold was determined by noting the level at which the ABR waveform disap- 
peared into the noise. For these experiments, sound calibration was obtained using 
a real pinna coupler**.Further details were provided in a previous publication*. 

Tissue collection and preparation. Neonatal mice were killed by decapitation, 
and cochleae dissected in cold HBSS with calcium and magnesium (Gibco). For 
embryos, timed pregnant dams were killed by isoflurane overdose followed by 


cervical dislocation. Their abdomens were opened to expose the uterus, which was 
dissected in cold HBSS with calcium and magnesium (Gibco); embryos were then 
collected and their cochleae removed. After dissection, neonatal and embryonic 
cochleae were processed depending on future use. For immunohistochemistry, 
embryonic and neonatal cochleae were fixed in 4% paraformaldehyde for 2 h at 
room temperature. For older tissues (>P20), mice were cardiac-perfused with 
4% paraformaldehyde, and cochleae were dissected and post-fixed in 4% para- 
formaldehyde for 2 h at room temperature. Cochleae from animals older than 
P5 were decalcified in 10% EDTA, pH 7.4, at 4°C until needed. Organs of Corti 
were dissected out from the cochleae into one apical, two middle and one basal 
sections using a whole-mount surface preparation method”. Frozen sections were 
processed as described’. 

Immunohistochemistry. Whole-mount organ of Corti sections were processed for 
immunohistochemistry as described previously**. Primary antibodies were mouse 
anti-calmodulin (1:100, C-7055, Sigma Aldridge), goat anti-oncomodulin (1:200, 
sc-7446, Santa Cruz), rabbit anti-prestin (1:1,000, from J. Zheng, Northwestern 
University), guinea pig anti- VGLUT3 (1:2,500, from R. Edwards, University of 
California, San Francisco), mouse anti-CtBP2 (1:400, 612044, BD Biosciences), 
rabbit anti-myosin7a (1:800, 25-6790, Proteus Biosciences), sheep anti-neuroplastin 
(1:150, AF7818, R&D Systems), mouse anti-BCL11B (1:400, ab18465, abcam), 
and goat anti-SOX2 (1:500, sc-17320, Santa Cruz). For BCL11B immune-labelling 
on whole-mount cochlea, we performed antigen retrieval by incubating samples 
in 10 mM sodium citrate, pH 6 with 0.25% Triton X-100 for 20 min at 92°C and 
cooling for 30 min at room temperature before blocking. For CtBP2 and SOX2 
immuno-labelling, samples were prepared using a freeze-thaw method. In brief, 
organ of Corti sections were incubated in 30% sucrose at room temperature for 
20 min, put in -80°C for 5 min and thawed at room temperature for 20 min; 
sucrose was rinsed off with PBS before blocking and incubation with primary 
antibodies at 37 °C overnight. Nuclei were counterstained with 1:1,000 DAPI or 
1:2,000 Hoechst 33342. 

X-gal staining. X-gal staining to detect 3-galactosidase expression on sections of 
Insm2'*~ embryos was performed as described”. 

Cell proliferation assay. In order to label hair and supporting cells generated from 
progenitors that proliferated postnatally, neonates were injected twice daily from 
PO to P5 with the thymidine analogue 5-ethynyl-2'-deoxyuridine (EdU; 50 mg/kg 
in sterile saline). EdU incorporation into DNA was detected using Click-iT Plus 
EdU Alexa Fluor 555 Imaging Kit (Thermo Fisher Scientific) according to the man- 
ufacturer’s manual. Following EdU detection, the samples were immune-labelled 
with antibodies as described above. 

Image acquisition and analysis. We acquired images on either a Nikon A1 or 
A1R+ Confocal imaging system using a 100 objective. 3D renderings were 
generated using NIS Elements AR4.60.00 (Nikon) and Imaris X64 8.4.1 (BitPlane) 
software. Nuclei and ribbon synapses were measured using built-in analysis func- 
tions on Imaris. After acquisition, we identically processed image pairs of control 
and knockout samples. This included adjustment for brightness, contrast and 
parameters for 3D volume and surface renderings of all images. 

FACS. For collection of OHCs or IHCs, organs of Corti were dissected from 
E18.5 embryos (Insm19? °° or Insm1SFPCe!+) or PO neonates (Insm1P Cr+; 
Atoh 1416FP+ R26RMMA0!+) in ice-cold HBSS with calcium and magnesium 
(Gibco). A portion of tail from each embryo and neonate was collected for gen- 
otyping. Organs of Corti were washed three times in cold 1 x PBS and then they 
were digested in 0.33 U/ml papain, 0.5 mM EDTA and 1 mM L-cystine in EBSS for 
10 min at 37°C, rinsed 3 times in 2% FBS and mechanically dissociated by gentle 
trituration (~100-150x with a P1000 pipet). Cell suspensions were kept on ice 
until FACS sorting on a BD FACS Aria 4 flow cytometer through a 100-j1m nozzle 
at speed 2 (<100 events/s). Hair cell populations were collected into RLT buffer 
(Qiagen, Valencia, CA). RNA was then isolated from cells using Qiagen RNeasy 
Plus Micro Kit or cells were stored at -80°C until RNA isolation. Isolated RNA 
was evaluated for quality and concentration on a BioAnalyzer and stored at -80°C. 
qRT-PCR. RT-PCR was performed using either SYBR Green or TaqMan systems. 
Total RNA was extracted from pools of hair cells collected from E18.5 Insm1 GERCre/- 
E18.5 Insml GFPCre/+ and PO Insm1 GFP.Cre/+. A toh JAIGFPI+. RIG Rid Tomato/ + 
mice through FACS. RNA was extracted using an RNeasy Plus Micro Kit (Qiagen) 
according to the manufacturer’s instructions. RNA quality was determined 
with a BioAnalyzer through NUSeq Core Facility, Northwestern University, 
Chicago, IL. 

For SYBR Green qRT-PCR, we used ~3,000 pg total RNA from each hair cell 
pool for first strand cDNA synthesis using iScript reverse transcription supermix 
(Bio-Rad) according to the manufacturer's manual. We then performed RT-qPCR 
with ~200 pg of first strand cDNA using SsoAdvanced Universal SYBR Green 
Supermix (Bio-Rad) in triplicate on a CFX Connect Real-Time PCR Detection 
System (Bio-Rad) using a 40-cycle protocol. 

For TaqMan qRT-PCR, we used 1 1g total RNA from each HC pool for first 
strand cDNA synthesis using SuperScript VILO cDNA Synthesis Kit with ezDNase 
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Enzyme (Applied Biosystems) according to the manufacturer’s instructions. First 
strand cDNA was subjected to ezDNase inactivation using 1 jl of 100 mM DTT 
per reaction. Prior to GRT-PCR, we performed pre-amplification of first strand 
cDNA using TaqMan PreAmp Master Mix (Applied Biosystems) according to the 
manufacturer's instructions. We then performed qRT-PCR using 1 j1g diluted 
pre-amplified cDNA (1:20 in TE buffer) per reaction in triplicate on a QuantStudio 
7 flex Real-Time system (Applied Biosystems) using a 14-cycle protocol at NUSeq 
Core Facility, Northwestern University, Chicago, IL. 

All primers used for qRT-PCR in this study were designed and pre-mixed to 
their optimal concentrations by BioRad and Applied Biosystems. RT-PCR reac- 
tions were performed according to the corresponding manufacturer’s instructions. 
RNA-seq and transcriptome analysis of embryonic OHCs and P0 hair cells. To 
purify enough RNA for deep sequencing and to analyse results statistically in order 
to determine the IHC and OHC transcriptomes, we collected by FACS six separate 
pools of IHCs (700-1,100 cells per pool) and six of OHCs (2,800-3,700 cells per 
pool) from Insm1SP Cr+; 4 toh 1A1CFP/+ ; RIGRAmate!+ mice at PO (generated by 
timed pregnancies and found to correspond in most cases to E19.5 and in the rest 
to E20.5). To determine the transcriptomes of OHCs with and without INSM1, 
we collected OHCs by FACS into three separate pools per genotype (Insm1°#?Ore/- 
and Insm1°?"!+), each with 2,200-5,000 OHCs from 8-12 E18.5 embryos. We 
extracted 3-7.5 ng of RNA per E18.5 OHC pool, ~3 ng per PO IHC pool and 
10-18 ng per PO OHC pool. We used only samples with an RNA integrity number 
(RIN) >8. 

Beijing Genomics Institute (BGI) performed sample preparation and sequenc- 
ing at their facility in the Children’s Hospital of Philadelphia (CHOP). The total 
RNA samples were first treated with DNase I to degrade any possible DNA con- 
tamination followed by ribosomal RNA removal using RiboZero (Epicentre), con- 
verted to cDNA and amplified with NuGEN Ovation RNA-Amplification System 
V2. Mixed with the fragmentation buffer, the mRNA was fragmented into short 
fragments of about 200 bp. Then the first strand of cDNA was synthesized using 
random hexamer-primer. Buffer, dNTPs, RNase H, and DNA polymerase I were 
added to synthesize the second strand. Double-stranded cDNA was purified with 
magnetic beads, end reparation and 3’-end single nucleotide A (adenine) addition 
were performed, and sequencing adaptors were ligated to the fragments, which were 
enriched by PCR amplification. Libraries were qualified and quantified with an Agilent 
2100 Bioanalyzer and ABI StepOnePlus Real-Time PCR. Individually barcoded 
100-bp paired-end library products were sequenced on the Illumina HiSeq2000 
(three libraries from E18.5 Insm1°"?'- and three from Insm1¢?? “+ OHCs) 
or the HiSeq4000 (six libraries each for IHCs and OHCs from PO Insm1 GFPCrel+. 
Atoh 1A!GFPI+ ;R26RMTomato!+ mice) and multiplexed per lane, yielding 48-50 
million (for each of the six E18.5 OHC libraries) and 92-116 million (for each of 
the twelve PO IHC or OHC libraries) paired reads. DNA read quality was evaluated 
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in fastq format using FastQC, adapters were trimmed, and reads of poor quality or 
aligning to rRNA sequences were filtered. 

The cleaned reads were aligned to the Mus musculus genome (mm 10) using 
STAR“. Read counts for each gene were calculated using htseq-count*! in con- 
junction with a gene annotation file for mm10 obtained from UCSC (University 
of California Santa Cruz; http://genome.ucsc.edu). Differential expression was 
determined using DESeq2™. The cutoff for determining significantly differentially 
expressed genes was an FDR-adjusted P value less than 0.05. 

Reporting summary. Further information on research design is available in 
the Nature Research Reporting Summary linked to this paper. 


Data availability 


All data are available from the corresponding authors upon reasonable request. 
RNA-seq data are available for public view at the gEAR portal (https://umgear.org/). 
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Extended Data Fig. 1 | Generation of a conditional KO allele of Insm1. 
a, We generated a targeting construct in which the sole exon of Insm1 
(green rectangle, with the coding sequence in dark green and the UTRs in 
light green) has a loxP site (purple triangle) inserted in a poorly conserved 
area of its 5/UTR and another loxP site downstream of the Insm1 gene. The 
construct also incorporates a neomycin resistance cassette (NEO, blue) 
surrounded by Frt sites (red triangles) and a thymidine kinase cassette 
(HSV-TK; orange), which are used to select for recombination events after 
gene targeting. b, We screened 439 clones and identified 5 recombinants (1 
non-recombinant wild type, B5, and 3 recombinants, B6, E10 and B3, are 
shown) with PCR using primers indicated in a (arrows). The expected size 
for the wild-type allele using primers WT to R2 is 6,163 bp. The expected 
size for recombinant clones using CKO-reverse is 6,145 bp. 


c, Selected embryonic stem (ES) cell clones were additionally screened for 
homologous recombination upstream of the first loxP site by Southern 
blotting after digestion with Spel and using the 5’ probe indicated in a’. 
Southern blotting was performed twice. The expected band sizes for wild- 
type and conditional KO alleles are 18,162 bp and 14,333 bp, respectively. 
From one of these clones (B3) we generated first chimeric mice and then 
mice with floxed alleles of Insm1 (obtained by crossing the chimaeras 
with mice expressing the FIpE recombinase (B6-Tg(CAG-FLPe)36, which 
deleted the NEO cassette flanked by FRT sites). Homozygous Insm1*" 
mice are viable, demonstrating that the /oxP insertions do not interfere 
with the vital functions of Insm1 and hence may be used for its conditional 
ablation. Co-expression with Cre recombinase generates an Insm1 KO 
allele lacking its entire coding sequence. 
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Extended Data Fig. 2 | Conditional ablation of Insm1 in cochleae. In situ 
hybridization for Insm1 transcripts on cryosections of embryonic E16.5 
and neonatal (PO and P1) cochleae. a—c, In control cochleae (top), Insm1 is 
expressed in all OHCs (72/72 OHCs from 3 animals) and spiral ganglion 
(SG; white arrows) at E16.5. However, no Insm1 was detected in the organs 
of Corti from apical turns, in which recognizable hair cells have not yet 
appeared (a, c; asterisks). By postnatal age PO-P1, Insm1 mRNA is present 
in 90% of OHCs (94/105 OHCs from 2 animals), and it is undetectable in 
spiral ganglion (b, d). a, b, Bottom, in TgPax2“’t;Insm1** mice, Insm1 
mRNA is undetectable in spiral ganglion and in all OHCs from E16.5 (0/69 
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OHCs from 2 mice; a) and PO (0/42 OHCs from 1 mouse; b) cochleae. 

c, d, Bottom, in Atoh1°'+;Insm1"“?e cochleae, Insm1 mRNA is present 
in spiral ganglion and 43% of OHCs (18/42 OHCs from 1 animal) at 
E16.5 (c), reduced to 7% (4/54) at E17.5 (not shown), and entirely absent 
all OHCs (0/60 OHCs from 1 animal) and spiral ganglion by postnatal 
day P1 (d). For quantification at E16.5, we did not include organs of Corti 
from apical turns, which do not yet have recognizable hair cells. Filled 
arrowheads indicate organs of Corti with Insm1 expression, and empty 
arrowheads indicate organs of Corti without Insm1 expression. Scale bars, 
200m. 


© 2018 Springer Nature Limited. All rights reserved. 


LETTER 


a b c 


DPOAE Thresholds 


ABR thresholds 


# IHCs per 100um 
a oS & 


o 


6 12 27 6 12 27 
Frequency (kHz) f2 Frequency (KHz) 


f Calmoduluin + (eSnRWIE) + DAPI 


Extended Data Fig. 3 | Conditional ablation of Insm1 in hair 

cells and spiral ganglia neurons using TgPax2“” causes hearing 
impairment and the appearance of IHC-like cells in place of OHCs. 

a, b, Hearing thresholds determined by ABRs (a) and DPOAEs (b) of 
TgPax2\!*Insm1*/? mice at age P35-P46 (black traces; n =4, 4 females) 
and control littermates (red traces; nm =5, 2 males and 3 females). The fact 
that shifts in ABR threshold are larger than shifts in DPOAE threshold 
may indicate an additional contribution to hearing impairment of the 
spiral ganglion neurons lacking INSM1 in TgPax2°'*;Insm1" cochlea. 
c, Despite the prevalence of OHCs with IHC characteristics (oc-IHCs) 

in TgPax2“"!*;Insm1*" cochleae (46.0 + 5.64% (mean +s.d.), n=3 
mice; Fig. 1m, n), these mice have the same density of IHCs (9.87 + 2.41 
cells per 100j1m along the organ of Corti; n = 3) as littermate controls 
(TgPax2©"'+;Insm1** and Insm1"’*; 10.54 + 1.67 cells per 100 um; n = 3) 
suggesting that oc-IHCs are not IHCs displaced from the inner to the 
outer compartment. d, There is no OHC loss in TgPax2'*Insm1** 
mice at ~P14—P16. Densities of oc-HCs do not differ between 
TgPax2\!*Insm1*/? mice (OHCs and oc-IHCs) and their littermate 
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controls (OHCs only) (29.88 + 7.45 cells per 100j1m along the organ of 
Corti of TgPax2°*;Insm1"* mice, n = 3; 34.3 + 6.92 cells per 100,1m 

in TgPax2“'*;Insm1*"* and Insm1*" littermate controls, n = 3). e, The 
number of oc-HCs per IHC does not differ between TgPax2!+Insm1'/? 
mice and their littermate controls (3.03 + 0.25 OHCs plus oc-IHCs per 
IHC in TgPax2@!*Insm le mice, n = 3; 3.24+0.21 OHCs per IHC 

in TgPax2“'*;Insm1*'* and Insm1*" littermate controls, n = 3). One- 
tailed Student’s t-tests were used in c—e, n is number of mice. Statistical 
significance is defined as P< 0.05. f, Immunofluorescence for the IHC- 
enriched calmodulin (green) and hair cell marker myosin VIIa (white) 
on whole-mount organs of Corti from mid-cochlear positions at ages 
~P14-P16 confirmed that many TgPax2'*;Insm1"” oc-HCs had IHC 
characteristics, in addition to having the flask shape and large nuclei of 
IHCs (blue, DAPI, marked with asterisks), as well as lacking prestin and 
expressing VGLUT3 (Fig. 1m). Scale bars, 101m. Biological replicates 
were used for all experiments and similar immunohistochemistry results 
obtained from three or more mice per genotype. 
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Extended Data Fig. 4 | [HC-like cells in the outer compartment 
(oc-IHCs) result from OHC misdifferentiation in the absence of 
INSM1, not from IHC displacement or from trans-differentiation 

of supporting cells. a, Atoh1“'+;Insm1*" mice have the same density 

of IHCs (11.2 + 1.2 (mean +s.d.) cells per 100 jm; n = 12) as littermate 
controls (Atoh1*'+;Insm1t/? and Insm1**; 11.5 + 1.3 cells per 100 jum; 
n= 13). b, There is no loss of OHCs in Atoh1@!*;Insm1*" mice up to 
P34. Densities of oc-HCs do not differ between Atoh1©!*+;Insm1/" mice 
(OHCs + oc-IHCs) and their littermate controls (OHCs only) (34.6 + 3.8 
cells per 100,1m in Atoh1@?'+ Insm1*" mice, n= 9; 37.3 £4.5 cells per 
100m in Atoh1“°'+;Insm1*/? and Insm1** littermate controls, n= 10). 

c, The number of oc-HCs per IHC do not differ significantly between 
Atoh1@"'*+;Insm1*’* mice and their littermate controls (3.1 + 0.3 OHCs and 
oc-IHCs per IHC in Atoh1@!'+ ;Insm1*" mice, n= 9; 3.3 £0.2 OHCs per 
IHC in Atoh1“?!+;Insm1+t!? and Insm1" littermate controls, n= 10). The 
criteria for identification of oc-IHCs in Atoh1@*'*;Insm1** mice were the 
presence in the outer compartment of hair cells expressing IHC markers 
(VGLUT3, high levels of calmodulin and/or nuclear CtBP2), lacking OHC 
markers (oncomodulin and/or prestin) and with a shape (determined 
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by myosin VIIa immunoreactivity) like that of IHCs. Mice used for hair 
cell counts were PO-P34. One-tailed Student’s t-tests were used in a—c; 

n is number of mice. Statistical significance is defined as P < 0.05. 

d, SOX2 immunoreactivity, which labels the nuclei of cochlear supporting 
cells and, under certain conditions, of hair cells trans-differentiated 

from them postnatally, was not present in cells of the OHC region in 
Atoh1@@'*;Insm1** pups (0/95 OHCs at PO, 0/42 OHCs at P2, and 0/39 
OHCs at P5). e, To track postnatal cell proliferation in the organ of Corti, 
neonatal mice were injected twice daily with the thymidine analogue 
5-ethynyl-2/-deoxyuridine (EdU) from PO to P5 or P8. The lack of EdU 

in any hair cell from in Atoh1@®'+;Insm1** mice (0/77 oc-HCs at P5 and 
0/40 oc-HCs at P8) confirmed that these cells, including oc-IHCs, were 
not produced from postnatally dividing supporting cells. Unless otherwise 
noted, images are from mid cochlear positions. Hair cells were identified 
by myosin VIa immunoreactivity, phalloidin, DAPI and Hoechst. Scale 
bars, 101m. Biological replicates were used for all experiments and similar 
immunohistochemistry results obtained from three or more mice per 


genotype. 
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Extended Data Fig. 5 | Conditional ablation of Insm1 in hair cells using 
Gfil-Cre causes no hearing impairment, and results in only very few 
oc-IHCs. We generated a conditional knockout of Insm1 in hair cells 
using Gfil-Cre, in which the expression of cre recombinase coincided 
with that of Insm1. a, b, Hearing thresholds determined by ABRs (a) and 
DPOAEs (b) of Gfil"'*;Insm1** mice at age P30-P35 (black traces; 

n= 6, 4 males and 2 females) and control littermates (red traces; n = 6; 

1 Insm1*", 1 Insm1*!* and 4 Gfil@!*;Insm1*"*; 4 males and 2 females). 
There is no significant difference in ABR and DPOAE thresholds at 

any frequency tested between Gfil©’/*+;Insm1"”* mice and their control 
littermates. c, Immunohistochemistry in whole-mount organs of Corti 
from mid-cochlear positions of P34 mice tested for hearing in a, b revealed 
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that Gfi1©/*;Insm1** mice had normal IHCs expressing high levels 

of calmodulin. However, very few oc-HCs also expressed calmodulin 

at high levels, oncomodulin at low levels, and had a round, flask shape 
similar to that of IHCs (0.78%; 5/526 OHCs from 2 mice). Because in 
these Gfil Crel+ Insm1*/* mice the onset of Cre recombinase expression 
coincides with that of Insm1 (E15.5-E17.5)*, their nascent OHCs will 
express Insm1 for at least several hours. This result indicates that brief 
expression of Insm1 is sufficient to promote proper OHC differentiation. 
Scale bars, 101m. Biological replicates were used for all experiments and 
similar immunohistochemistry results obtained from three or more mice 
per genotype. 
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Extended Data Fig. 6 | FACS purification of and RNA extraction from 
OHCs from an E18.5 Insm1&F?'/+ embryo. a—c, Forward and side light 
scattering were used to exclude dead cells and debris (a) and aggregates 
(=2 cells) (b, c). d, Live cells were gated in green and red (PerCP-Cy5, to 
assess autofluorescence) channels to define the GFP* (green dots) and 
GFP” (red dots) sorting windows. e, Myosin VIIa immunoreactivity and 
DAPI stain of cells collected through cytospinning after FACS confirm 
that most of the 547 sorted GFP* cells are hair cells. This verification was 
done on all hair cell pools sorted (three pools per genotype). Inset is a 


LETTER 


c d 
208 

cS 
Oo wo 
e o 
. 6 
a 

5 104 o 
9 a 
ep) 


=> 
N 
3 
fs) 


n=5 


= 
a 
fo) 


LENS Dee ae | 
25200 1000 4000 


Relative Enrichment 
a i=] 
[=] =] 


Myo7a S100 Hes5 


representative merged image of one sorted OHC at high magnification. 
In this pool, no autofluorescent cells were collected. f, RT-qPCR after 
cell sorting (mean + s.e.m.) reveals that, compared with GFP” cells, GFP* 
cells express the hair cell marker gene Myo7a and not the supporting cell 
marker genes $100 and Hes5. g, To ensure the quality of the extracted 
RNA, the RIN score was determined using a BioAnalyzer. g, Similar RIN 
scores were obtained from all pools of OHCs examined (including the 
three per genotype used for RNA-seq in Fig. 4a). 
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Extended Data Table 1 | Confirmed genes misregulated in OHCs lacking INSM1 


RNAseq RNAseq RT-qPCR RT-qPCR 
Change Change Change — p-value 


GeneSymbol Description (IHC/OHC) (KO/Het (KO/Het) (KO/Het) 
Tbx2 T-box 2 19.28 13.75 102.31 0.000428 
Fef8 fibroblast growth factor 8 18.02 8.12 65.08 0.004112 
Smad3 SMAD family member 3 10.69 14.04 46.43 4.14E-05 
Nhs Nance-Horan syndrome (human) 8.01 11.56 3.15 0.034775 
Lrrni leucine rich repeat protein 1, neuronal 7.86 4.93 7.79 0.003956 
Bript BRCA1 interacting protein C-terminal helicase 1 7.47 12.65 118.50 5.53E-05 
Rin3 Ras and Rab interactor 3 5.71 10.98 1.88 0.011625 
Pcdhi protocadherin 1 4.47 3.53 2.84 0.009309 
Spryd3 SPRY domain containing 3 3.64 6.70 1.53 0.033057 
Pacs1 phosphofurin acidic cluster sorting protein 1 3.56 2.63 2.33, 0.00229 
Cari3 carbonic anhydrase 13 3.55 4.27 3.69 0.003681 
Tmprss7 transmembrane serine protease 7 3.03 3.07 1.58 0.033105 
Rporm reprimo, TP53 dependent G2 arrest mediator candidate 2.89 1.83 1.67 0.039198 
Zfp668 zinc finger protein 668 1.85 3.23 1.30 0.044668 
Mtss1 metastasis suppressor 1 1.83 2.34 1.56 0.02911 
Cux1 cut-like homeobox 1 1.61 2.05 1.57 0.026945 
Lrrc8b leucine rich repeat containing 8 family, member B 1.56 1.71 1.31 0.045587 
Rail retinoic acid induced 1 1.57 2.56 1.14 0.033262 
Pink1 PTEN induced putative kinase 1 1.36 6.52 4.42 0.003253 
Cmtms CKLF-like MARVEL transmembrane domain containing 8 1.21 8.29 1.84 0.002469 
Id4 inhibitor of DNA binding 4 0.99 1.94 1.84 0.063034 
Sez6l seizure related 6 homolog like 0.23 7.47 5.04 0.020066 


Differential expression of IHCs with respect to OHCs and of OHCs without INSM1 (from Insm1°'P°'e~ mice, referred to as KO) with respect to OHCs with INSM1 (from Insm1°'PC"/+, referred to as Het). 
Differential expression between KO and Het OHCs estimated by RNA-seq was confirmed by TaqMan RT-qPCR (n=5 pools of OHCs per genotype for Tbx2, Nhs, Lrrn1, Brip1 and Rin3; n=4 for all other 
genes). P values are for one-tailed t-tests on RT-qPCR values. All 22 genes increase their expression in OHCs lacking INSM1. Of these, all except Sez6/ are normally preferentially expressed in IHCs. 
Note that for /d4, differential expression between IHCs and OHCs was not detected by RNA-seq at PO, and it did not reach significance between KO and HET OHCs by RT-qPCR at E18.5. However, 
significance was achieved (P=0.013) at E16.5, at which time differential expression was confirmed and visualized by RNAscope in situ hybridization (Fig. 4j). Hence, for /d4 the differential expression 
occurs transiently and very early. 


© 2018 Springer Nature Limited. All rights reserved. 


1 oa Ie en 


https://doi.org/10.1038/s41586-018-0728-4 


Helios is a Key transcriptional regulator of outer 


hair cell maturation 
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The sensory cells that are responsible for hearing include the 
cochlear inner hair cells (IHCs) and outer hair cells (OHCs), with the 
OHCs being necessary for sound sensitivity and tuning!. Both cell 
types are thought to arise from common progenitors; however, our 
understanding of the factors that control the fate of IHCs and OHCs 
remains limited. Here we identify Ikzf2 (which encodes Helios) as 
an essential transcription factor in mice that is required for OHC 
functional maturation and hearing. Helios is expressed in postnatal 
mouse OHCs, and in the cello mouse model a point mutation in Ikzf2 
causes early-onset sensorineural hearing loss. Ikzf2°!"/!’" OHCs 
have greatly reduced prestin-dependent electromotile activity, a 
hallmark of OHC functional maturation, and show reduced levels 
of crucial OHC-expressed genes such as Slc26a5 (which encodes 
prestin) and Ocm. Moreover, we show that ectopic expression of 
Ikzf2 in IHCs: induces the expression of OHC-specific genes; reduces 
the expression of canonical IHC genes; and confers electromotility 
to IHCs, demonstrating that Ikzf2 can partially shift the IHC 
transcriptome towards an OHC-like identity. 

The mature mammalian cochlea contains two distinct types of sen- 
sory cells, IHCs and OHCs, each of which are highly specialized and, in 
humans, do not regenerate once they are damaged or lost”. Progressive 
loss of these cells, particularly the OHCs, underlies much of the 
aetiology of age-related hearing loss—a worldwide epidemic**. 
Although these two cell types were first described by Retzius in the 
1800s, the mechanisms that underlie the specification of their common 
progenitor cells to functional inner versus outer hair cells remain poorly 
understood. In addition, attempts to direct stem cells towards hair cell 
fates have, so far, resulted only in the formation of immature cells that 
lack many of the markers of mature IHCs or OHCs”. Given the vulner- 
ability of the OHCs, identifying factors that specify OHC fate is crucial, 
not only for understanding the biology of this unique cell type, but also 
for ultimately working towards regenerative therapies for hearing loss. 

To define a set of high-confidence OHC-expressed genes for 
downstream gene regulation analyses, we crossed the knock-in pres- 
tin-CreER!? mouse, which can be induced to express Cre recombinase 
specifically in OHCs, with a transgenic RiboTag mouse, to enable OHC- 
specific ribosome immunoprecipitation®’. Cochlear ducts from the 
resulting Ribo Tag" ;prestinT=X!?/+ mice were collected at five post- 
natal time points (postnatal day (P) 8, 14 and 28, and 6 and 10 weeks), 
and actively translated OHC transcripts were enriched for by ribosome 
immunoprecipitation, followed by RNA sequencing (RNA-seq) of all 
immunoprecipitated and paired input RNA (Extended Data Fig. 1a, b, 
Supplementary Table 1). We calculated an OHC enrichment factor 
based on the immunoprecipitated/input RNA log» fold change for each 
gene at each time point (Supplementary Table 2). Reassuringly, known 


postnatal hair cell-enriched and OHC-expressed genes such as Pou4f3, 
Gfil, Strc, Ocm and Slc26a5 generally had high enrichment factor values 
across all time points (enrichment factor (EF) > 1), whereas prominent 
IHC marker genes such as Otof, Atp2a3 and Slc17a8 were generally 
depleted from the immunoprecipitated samples (EF < —1). In addition, 
marker genes for supporting cells, neurons and otic mesenchyme were 
also depleted (Extended Data Fig. 1c). Further informatics analyses 
of our RiboTag OHC dataset demonstrated a systematic enrichment 
of OHC markers and a depletion of IHC markers previously identi- 
fied in an adult mouse OHC and IHC transcriptomic dataset®, and 
classified the OHC-enriched transcripts into three clusters (Extended 
Data Fig. 1d-f, Supplementary Table 3). Intersecting genes with tran- 
scripts that were enriched in OHCs in our most mature RiboTag OHC 
data point (10 weeks, EF > 0.5) compared with the published dataset® 
resulted in a list of 100 highly confident postnatal OHC markers that 
are significantly and consistently enriched in postnatal OHCs (Fig. 1a, 
Supplementary Table 4). We and others have previously shown that 
relevant transcriptional regulators can be discovered by analysing 
the promoters of cell-type-specific genes to identify statistically over- 
represented transcription factor-binding motifs®’®. A transcription 
factor-binding motif prediction analysis of the 100 OHC marker genes 
identified several enriched motifs in the 20-kb regions that centred 
around the transcription start site, the top five of which correspond to 
the transcription factors HNF4A, MZF1, POU3F2, Helios and REX31!. 
Of these, only Ikzf2 (which encodes Helios) was included in the list of 
100 OHC marker genes, and was found to be markedly enriched in 
OHCs at all time points (Fig. 1b, c), with an approximately fourfold 
enrichment in OHCs compared to IHCs in the previously published 
dataset® (Supplementary Table 4). Further characterization of Helios 
protein expression in the inner ear confirmed that it is restricted to 
the OHC nuclei starting from P4, and persists in functionally mature 
OHCs (Fig. 1d—f, Extended Data Fig. 2a). Together, these data suggest 
an important role for Helios in regulating the OHC transcriptome from 
early postnatal to adult stages. 

A recent phenotype-driven N-ethyl-N-nitrosourea (ENU) mutagenesis 
screen, undertaken at the MRC Harwell Institute, identified a C-to-A 
transversion at nucleotide 1551 of Ikz/2 in the cello mouse mutant, causing 
a non-synonymous histidine-to-glutamine substitution (p.H517Q) 
in the encoded Helios transcription factor’? (Fig. 1g, Extended Data 
Fig. 2b-d). A combination of in silico mutation analyses, structural 
3D modelling, immunolabelling of Helios in the cello mutant mice, 
and in vitro assays predicted and validated a deleterious effect of the 
cello mutation on the ability of Helios to dimerize, without impair- 
ing its cellular localization (Fig. 1g, Extended Data Figs. 2e and 3). 
We further investigated the functional role of Helios in hearing by 
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Fig. 1 | Helios is a candidate regulator of OHC genes. a, The 100 

OHC marker genes (n = 100) are enriched in OHCs at all RiboTag 

OHC dataset time points compared to the expression of all other genes 
detected (background, BG) (n = 13,044). P values: P8 = 1.73 x 10777, 
P14=6.55 x 10°, P28 = 1.60 x 10-!8, 6 weeks (w) =7.79 x 10738, 

10 weeks = 1.43 x 10~*° (two-sided Wilcoxon’s test). Centre line 
represents median enrichment factor (EF; log fold change), box 
demarcates first and third quartiles, whiskers demarcate first and third 
quartile + 1.5 x interquartile range (IQR) values, dots denote single 
outliers. b, Transcription factor-binding motif analysis using the 100 
highly confident OHC marker genes identifies the binding signature for 
Helios as significantly overrepresented. Normalized enrichment score 
(NES) = 3.85; NES > 3.0 corresponds to a false discovery rate (FDR) of 
3-9%; see ref. ''. c, Ikzf2 transcript enrichment in OHCs as measured by 
RiboTag OHC RNA-seq. d, Specific expression of Helios in the nuclei 

of wild-type P8 OHCs (white arrows). n = 3 biologically independent 
samples. Scale bar, 50 jum. OoC, organ of Corti; RM, Reissner’s membrane; 


assessing auditory brainstem response (ABR) thresholds in wild-type 
and cello mice across several time points. Results show that Ikzf2°"0/“el” 
mice have progressive deterioration of hearing function that starts as 
early as P16 (>60 dB sound pressure level (SPL)), with a threshold 
of >85 dB SPL by 9 months (Fig. 2a, b, Extended Data Fig. 4a—c). Using 
scanning electron microscopy, we show that the ultrastructure of the 
cochlear sensory epithelia and hair cell stereocilia bundles in the cello 
mice appear normal up to 1 month of age, after which the OHC bun- 
dles, and later the IHC bundles, begin to degenerate (Extended Data 
Figs. 4d, 5a-d, Supplementary Tables 5, 6). These data indicate that the 
hearing impairment in cello mice precedes the loss of hair cell bun- 
dles, and suggest that the Helios mutation instead leads to a functional 
deficit in OHCs. Furthermore, by using a second Ikzf2 mutant allele 
(Ikzf2%"8°°), which leads to an in-frame deletion of the third coding 
exon, we confirm Ikzf2 as the causative gene that underlies the auditory 
dysfunction in the cello mutants. At 1 month of age, Ikzf2°"'°/4#8 com- 
pound heterozygotes display increased ABR thresholds (up to 40 dB 
SPL) compared to heterozygotes and wild-type mice (Extended Data 
Fig. 5e, f), confirming Ikzf2°"” as the causative allele in the cello mutant. 


L 
M. musculus & 
H. sapiens L 
P. troglodytes u 
G. gallus L 
T. rubripes 


L 
X. tropicalis = & 
ZnF_CjH) 


DNA binding Dimerization 
ATG ZnF 1-4 ZnF 5-6 TAG 


Pee OeFire_a 


2 3 4 5 6 7 8 


HHHHHHHH 
anaaaaaaaA 
aaaaaaaa 
BAHHHAHHAD 
Mom mom mom mm O 


E 
E 
E 
E 
E 
E 
E 
E 


anaaaaaana 
| 242u2nzeae 


NR 


Gin517 


Ikzf2* 
7 Zn 
i) 


His517 


SG, spiral ganglion; SL, spiral ligament; SV, stria vascularis. e, Helios 
expression is maintained in wild-type OHCs at 1 month (white arrows). 
n= 3 biologically independent samples. Scale bar, 10 jum. f, Helios is 
detected in wild-type OHCs from P4 and is maintained in mature P16 
OHCs. n=2 (P3) and n=4 (P4, P8 and P16) biologically independent 
samples. Specificity is confirmed by the loss of labelling when the anti- 
Helios antibody is ‘pre-blocked’ with its immunizing peptide. n =5 
biologically independent samples. Scale bars, 10 jum. HC, Hensen's 

cells; PC, pillar cells. g, Top, the genomic and domain structure of Ikzf2. 
Black, 5’ untranslated region; light grey, N-terminal DNA-binding domain; 
dark grey, C-terminal dimerization domain. The Ikzf2°"? mutation lies 

in ZnF6. Bottom, further alignment of the Helios ZnF6 sequence with its 
paralogues and the classical Cys,His2 ZnF motif shows that the H517Q 
cello mutation causes substitution of a highly conserved zinc-coordinating 
histidine residue. 3D modelling of wild-type Ikzf2* ZnF6 and mutant 
Ikzf2°"° ZnF6 illustrates the requirement of residue His517 for zinc 
coordination, which is not possible when residue Gln517 is substituted. 


To explore the effect of the cello mutation on OHC physiology, we 
investigated the basolateral properties of OHCs. We found that the 
mechanoelectrical transducer (MET) current (Extended Data Fig. 6a-c) 
and the adult-like potassium (K*) current Ix, (Extended Data Fig. 6d-h) 
have normal biophysical characteristics in Ikzf2°""/*’ OHCs. The 
resting membrane potential (Vm) of OHCs is also similar between 
genotypes (Ikzf2!/+; —68 +2 mV (mean +s.e.m.); Ikzf2°le/eello, 
—70+1 mV). We then investigated whether Helios regulates OHC elec- 
tromotile activity. We found that stepping the membrane potential from 
—64 mV to +56 mV causes the OHCs from both genotypes to shorten 
(Fig. 2c, d), as previously described'*-!°. However, Ikzf2°"!/«l!0 QHCs 
show significantly reduced movement compared to Ikzf2°"/+ control 
OHCs (Fig. 2e), even when the values are normalized to their reduced 
surface area (Fig. 2f). We also found that young adult Ikzf2°l'/el”° 
mice have significantly reduced distortion product oto-acoustic 
emission (DPOAE) responses (<—15 dB SPL) compared to littermate 
controls (Fig. 2g), further demonstrating impaired OHC function. 

To identify genes regulated by Helios in OHCs, we compared gene 
expression from the cochleae of P8 Ikzf2°!0/llo and their wild-type 
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Fig. 2 | Helios is required for hearing and OHC electromotility. 

a, b, Averaged ABR thresholds for cello mice at P16 (a) and 1 and 

9 months of age (b). Age-matched Ikzf2*/* and Ikzf2°""/* controls display 
thresholds within the expected range (15-30 dB SPL) at all time points 
tested. n=4 (a) and n=5 (b) biologically independent animals per 
genotype at each time period. Data are mean + s.e.m. ****P < 0.0001 
(P16 Tkzfacellor cello vg Ikzf2*'* and vs Tkzfacelo! + at 8 kHz, 16 kHz, 32 kHz, 
and click stimulus); *P = 0.0284 (1- vs 9-month Ikzf2°!!o/ello at 8 kHz); 
*P=0.0166 (1- vs 9-month Ikzf2e"/"” at 16 kHz); *P = 0.0303 (1- vs 
9-month Ikzf2°"oello at 32 kHz); **P = 0.0042 (1- vs 9-month Ikzf2lo/ello 
click stimulus) (one-way ANOVA with Tukey post hoc test (a) or 
two-sided Welch's t-test (b)). See also Extended Data Fig. 4. c, d, Left, 
images show a patch pipette attached to an OHC from control Ikzf2°l!0/+ 
(c) and mutant Tkzf2eele/ cello (d) cochleae at P16-P18. Red lines indicate 
the position of the OHC basal membrane before (left) and during (right) 
a depolarizing voltage step from —64 mV to +56 mV, highlighting 

the shorting of the cells. Scale bars, 5 jxm. Right, time-based z-stack 
projections, in which red lines indicate the resting position of the basal 
membrane and the green lines indicate the movement. n = 10 (Ikzf2°!"!*) 
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andn=21 (Ike facto! cello) stack projections (one set per OHC) from 5 
biologically independent animals per genotype. e, f, Average movement 
was significantly reduced in Ikzf2°!!*!!° OHCs compared to Ikzf2°!/+ 

at P16-P18 (e), even after normalization to respective membrane 
capacitance (f) (for this set of recordings, Tkzfacete! *:13.6+0.4 pF; 
Tkzfacello/cello: 10.0 + 0.3 pF). Data are mean +s.e.m. n= 10 (Ikzf2"/+) and 
n=21 (Ikzf2"ell0) OHCs from 5 biologically independent animals per 
genotype. ****P < 0.0001 (two-sided Welch's t-test). g, Average DPOAE 
responses for cello mice at 1 month of age (n= 5 biologically independent 
animals per genotype). Data are mean + s.e.m. ****P< 0.0001 (Ikzfaelecello 
vs Ikzf2t!+ and vs Ikzf2°"e'* at 8 kHz, 16 kHz); ***P = 0.0004 (Ikzf2elo/cello 
vs Ikzf2*!+ at 32 kHz); ***P = 0.0012 (Ikzf2°l/ello vg Tkzf2°"'+ at 32 kHz) 
(one-way ANOVA with Tukey post hoc test). h, i, NanoString validations 
of genes downregulated in Ikzf2°"’“"" cochleae at P8 (h) and results 
showing no change in expression of other OHC transcription factors (i). 
Data are normalized to wild-type (Ikzf2*/*) and shown as mean +s.d. 
(n=4 biologically independent samples per genotype). *P = 0.028 

(Car7; Ikzf2cello/cello vs Tkzf2*!+); *P = 0.017 (Ppp17r1); **P =0.006 (Ocm); 
*P = 0.017 (Slc26a5) (two-sided Welch’s t-test). 
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Fig. 3 | Partial transcriptional conversion of Anc80-Ikzf2 transduced IHCs 
identified by sCRNA-seq. a, Representative Myo15"'+;ROSA26 AG t4Tomato 
cochlear whole-mount staining. Myo15“°-driven tdTomato expression 

is hair cell specific at P6 (n =3 biologically independent samples with 
similar results). Scale bar, 50 jm. b, t-distributed stochastic neighbour 
embedding (t-SNE) plots of all cochlear hair cells profiled by scRNA-seq, 
including the cluster to which each cell was assigned, the experimental 
origin of each cell (cochlea injected with Anc80-Ikzf2 or Anc80-eGFP), 
and the relative transcript abundance of Anc80-Ikzf2 measured in each 
cell. c, Anc80-Ikzf2 is highly expressed in the Anc80-Ikzf2(+) IHCs 

and OHCs, whereas Anc80-eGFP expression is only seen in the cells 
assigned to the Anc80-Ikzf2(—) IHC and OHC clusters. Dots represent the 
expression values of individual cells, with width of violins summarizing 
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overall relative distribution of expression. d, Canonical hair cell (HC) 
markers are highly expressed in all clusters, and not notably changed 

as a result of Anc80-Ikzf2 expression. e, IHC-enriched genes that are 
highly expressed in control IHCs vs control OHCs, but are significantly 
reduced in Anc80-Ikzf2(+) IHCs. Anc80-Ikzf2(—) IHC (n= 34) vs Anc80- 
Ikzf2(+) IHC (n= 40) FDR: Slc17a8 =2.25 x 10-, Otof=6.76 x 10-4 
(Kruskal-Wallis test followed by post hoc pairwise Wilcoxon ranked 

sum test adjusted for multiple comparisons). f, OHC-enriched genes that 
are induced in Anc80-Ikzf2(+) IHCs. Anc80-Ikzf2(—) IHC (n = 34) vs 
Anc80-Ikzf2(+-) IHC (n = 40) FDR: Ocm =3.65 x 10-8, Lbh = 1.81 x 107! 
(Kruskal-Wallis test followed by post hoc pairwise Wilcoxon ranked 

sum test adjusted for multiple comparisons). See also Extended Data 

Figs. 8 and 9. 
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Fig. 4 | Helios overexpression modulates expression of hair cell 
markers. a, b, The IHC markers OTOF and VGLUT3 are downregulated 
in Anc80-Ikzf2-transduced IHCs (n = 3 biologically independent 
samples). c, The OHC marker OCM is expressed in Anc80-Ikzf2- 
transduced IHCs (n = 3 biologically independent samples per condition). 
d, Ferlb expression during wild-type (WT) mouse inner ear development 
as detected by in situ hybridization. At embryonic day (E) 16, Ferlb 
expression is not detected in the inner ear, but by PO it is detected in 
both IHCs and OHCs, and is largely restricted to the IHCs by P8 (n =3 
biologically independent samples per time point). e, In the absence of 


littermate controls by RNA-seq. We identified 105 upregulated and 36 
downregulated genes in Ikzf2°"”**""° cochleae (Supplementary Table 7), 
including downregulation of the canonical OHC markers Slc26a5 
and Ocm, which was confirmed by NanoString validation (Fig. 2h). 
Furthermore, we did not observe modulation of other OHC-expressed 
transcription factors selected from a previously published dataset’® 
(Fig. 2i), suggesting that the observed dysregulation in OHC genes 
results from disruption of a specific transcriptional cascade. Notably, 
by P16, the transcript levels of Car7, Ocm and Slc26a5, but not Ppp1r17, 
in Ikzf2°"e!”° cochleae are similar to the levels of wild-type littermate 
controls, suggesting that other factors may be compensating for the 
functional loss of Helios by this time point (Extended Data Fig. 6i). 
To characterize the transcriptional cascade downstream of Helios, 
we performed in vivo Anc80L65 adeno-associated virus (AAV) gene 
delivery of a Myc-tagged Ikzf2 or enhanced green fluorescent protein 
(eGFP) (hereafter termed Anc80-Ikzf2 or Anc80-eGFFP, respectively) 
to neonatal inner ears of Myo 15°" +;ROSA26C4G-tdTomato mice, sorted 
the cochlear hair cells at P8, and measured resultant changes in gene 
expression using single-cell RNA sequencing (sCRNA-seq)’””" (Fig. 3a, 
Extended Data Fig. 7). The hair cells from inner ears injected with 
Anc80-Ikz{2 separated into two distinct sets of clusters, containing both 
IHCs and OHCs. One set of IHCs and OHCs completely overlapped 
with the hair cells from the control ears injected with Anc80-eGFP 
(Fig. 3b, bottom clusters), whereas the other set clustered separately 
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functional Helios (Ikzf2celto/ cello mouse), Ferlb is robustly expressed in IHCs 
and OHCs at P8. IHC expression of Ferlb is not affected by Anc80-eGFP 
transduction, whereas Ferlb expression is lost in Anc80-Ikzf2-transduced 
hair cells (n = 3 biologically independent samples per condition). f, g, 
Prestin expression can be seen in Anc80-Ikzf2-transduced IHCs up to 

8 weeks of age (n = 3 biologically independent samples at 6-8 weeks) (f), 
and overlaps with MYC staining (g). Scale bars, 10 j1m (a-e), 100 jum (f) 
and 20 jum (g). Arrows denote OHCs, arrowheads denote IHCs. See also 
Extended Data Fig. 10. 


(Fig. 3b, top clusters). Separation of the two sets of clusters showed a 
clear correlation with expression of the Anc80-Ikzf2 transgene (Fig. 3b), 
in which hair cells in the bottom clusters had lower expression of 
Anc80-Ikzf2, and the hair cells in the top clusters had higher expres- 
sion of Anc80-Ikzf2 (hereafter defined as Anc80-Ikzf2 low (—) and high 
(+), respectively). Because the hair cells defined as Anc80-Ikzf2(—) 
clustered together with the hair cells transduced with Anc80-eGFP, 
these two groups of hair cells were merged and named Anc80-Ikzf2(—) 
IHCs and OHCs for all downstream analyses (Fig. 3b, c). 

Although the overexpression of Ikzf2 in IHCs and OHCs did not 
change the expression of hair cell markers such as Pou4f3 and Calb1 
(Fig. 3d), it led to a significant downregulation of many genes whose 
transcripts were identified as IHC-enriched in the control hair cell 
populations, including Slc17a8, Otof, Rprm, Atp2a3 and Fef8 (Fig. 3e, 
Extended Data Fig. 8, Supplementary Tables 8-10). Notably, some of the 
genes that are downregulated in both Anc80-Ikzf2(+) IHCs and OHCs 
are genes that are normally expressed in both cell types in early postna- 
tal development, and that later become IHC-specific'””° (for example, 
Pvalb and Otof; Supplementary Table 10). This suggests that the over- 
expression of Ikzf2 in OHCs results in an accelerated downregulation 
of these genes. Furthermore, Ikzf2 overexpression in IHCs results in 
the upregulation of genes that are normally enriched in OHCs, such 
as Ocm, Pde6d, Ldhb and Lbh (Fig. 3f, Extended Data Fig. 8). Overall, 
these data suggest that during normal OHC development, Helios 
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functions to decrease the expression of early pan-hair-cell markers, 
such as Otof, in the maturating OHCs, as well as to upregulate OHC 
marker genes. A correlation analysis further validates the role of Helios 
in regulating OHC-related gene expression (Extended Data Figs. 8, 9, 
Supplementary Table 11). The effect of Ikzf2 transduction on IHC gene 
expression was also validated by immunolabelling for OTOK VGLUT3, 
OCM or prestin and by in situ hybridization for Ferlb (Fig. 4, Extended 
Data Fig. 10a, b). Analysis of the surface characteristics of the trans- 
duced IHCs does not show a change from an IHC-like to an OHC-like 
stereociliary bundle, consistent with a partial role for Helios in regulat- 
ing OHC-fate (Extended Data Fig. 10c). However, Ikzf2 transduction 
resulted in the appearance of prominent voltage-dependent (nonlinear) 
capacitance in IHCs (Extended Data Fig. 10d, e), which is an electrical 
‘signature’ of prestin-dependent OHC electromotility”!””. These data 
indicate that Anc80-Ikzf2-transduced IHCs start to acquire the major 
function of normal OHCs. 

In conclusion, our study demonstrates that Helios is necessary for 
hearing and is a crucial regulator of gene expression in the maturing 
postnatal OHC. In particular, our results suggest that Helios functions 
to suppress IHC and early pan-hair-cell gene expression in OHCs, as 
well as to upregulate canonical OHC marker genes. It further shows 
that Helios is sufficient to induce the essential functional character- 
istic of electromotility and many of the molecular characteristics of 
OHCs when expressed in early postnatal IHCs, albeit not all of them, 
supporting the notion that additional OHC-expressed transcription 
factors are involved in postnatal OHC development. To our knowledge, 
this is the first study to demonstrate functional shifts in postnatal hair- 
cell molecular identities via viral gene delivery, and it suggests that the 
delivery of combinations of transcription factors may lead to successful 
regeneration of functional OHCs in the deafened cochlea. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized, and investigators were not blinded to allocation during 
experiments and outcome assessment unless stated otherwise. 

Animal procedures. Animal procedures performed at the University of Maryland 
School of Medicine were carried out in accordance with the National Institutes 
of Health Guide for the Care and Use of Laboratory Animals and have been 
approved by the Institutional Animal Care and Use Committee at the University 
of Maryland, Baltimore (protocol numbers 1112005 and 1015003). The RiboTag 
(maintained on a C57BL/6N background), prestin"’#®” and Myo15‘ mouse 
models (maintained on a C57BL/6J background) have been described previ- 
ously®”3, and were provided by M. K. Lobo (RiboTag), J. Zuo (prestin"=®"), 
and C. Petit and T. Friedman (Myo15“°). CBA/CaJ mice (stock 000654) and 
B6.Cg-Gt(ROSA)26Sor'"!4(CAG-tdTomato)Hze 77 mice (stock 007914, referred to as 
ROSA264C-H4Tomato) were procured from the Jackson Laboratory. The speci- 
ficity of prestin°=! was determined by crossing prestinER1/"eERT? mice to 
ROSA26AG t4Tomato mice, and the resulting offspring were dissected at P21 for 
whole-mount immunohistochemistry. To generate animals for the RiboTag OHC 
RNA-seq dataset, RiboTag/"4 mice were crossed to prestin?R!2/reERT2 mice to 
produce RiboTag™'*;prestin"=®"2/* mice. These mice were further intercrossed 
to obtain double homozygous Ribo Tag"; prestinT=R/eER? animals, which 
were then crossed to CBA/CaJ mice to generate F; RiboTag™4!* ;prestinER?2/+ 
offspring on a mixed CBA/C57BL/6 background, avoiding the recessively inherited 
age-related hearing loss phenotype inherent to C57BL/6 mice”*. Recombination 
was induced by tamoxifen injection (3 mg per 40 g body weight in mice younger 
than 21 days, 9 mg per 40 g body weight in mice 21 days or older), and cochlear 
tissues were collected at the following ages: P8, P14, P28, 6 weeks and 10 weeks. 
For the cello RNA-seq and NanoString experiments, cochlear ducts from Tkzf2t! T; 
Tkzf2e"l+ and Ikzf2°l/llo mice were dissected at P8 and P16. CD-1 or C57BL/6 
pregnant females were procured from Charles River or the University of Maryland 
School of Medicine Veterinary Resources. Resulting neonates were injected with 
Anc80L65 virus between P1 and P3, and dissected for later analyses between P8 and 
8 weeks. For the Anc80L65-transduced IHC scRNA-seq experiment, Myo 15°”"* 
mice were crossed to ROSA26©4¢-!47emato mice, and resulting offspring were 
injected with Anc80L65 virus between P1 and P3, and the cochlear epithelium 
was collected at P8. Additionally, several litters with Anc80-Ikzf2-injected pups 
and their control littermates (aged P7-P8), together with a mother, were sent to 
the University of Kentucky for the measurements of nonlinear (voltage-dependent) 
capacitance, an electrical ‘signature’ of electromotility. All animal procedures 
for these experiments were approved by the Institutional Animal Care and Use 
Committee at the University of Kentucky (protocol 00903M2005). Both male and 
female animals were used for all experiments. 

Animal procedures performed at the MRC Harwell Institute were licensed by 
the Home Office under the Animals (Scientific Procedures) Act 1986, UK and 
additionally approved by the relevant Institutional Ethical Review Committees. 
The cello mutant mouse was originally identified from the MRC Harwell Institute 
phenotype-driven N-ethyl-N-nitrosourea (ENU) Ageing Screen’. In this 
screen, ENU-mutagenized C57BL/6J males were mated with wild-type ‘sighted 
C3H’ (C3H.Pde6b+) females”. The resulting G, males were crossed with C3H. 
Pde6b+ females to produce G) females, all of which were screened for the Cdh23! 
allele’*. Cdh23*!*+ G, females were then backcrossed to their G; fathers to generate 
recessive G3 pedigrees, which entered a longitudinal phenotyping pipeline. 
Auditory phenotyping comprised clickbox testing at 3, 6, 9 and 12 months of 
age and ABR at 9 months of age. The Ikzf2*"®”? mutant line was generated by 
the Molecular and Cellular Biology group at the MRC Harwell Institute using 
a CRISPR-Cas9-mediated deletion approach. Both male and female mice were 
used for experiments. 

RiboTag immunoprecipitations. RiboTag immunoprecipitations were performed 
as described previously’. In brief, for one biological sample, 10 cochlear ducts from 
5 mice were pooled and homogenized in 1 ml of supplemented homogenization 
buffer (50 mM Tris-HCl, pH 7, 100 mM KCl, 12 mM MgCh, 1% Nonidet P-40, 
1 mM 1,4-dithiothreitol, 1 x protease inhibitor cocktail, 200 U ml~! RNaseOUT, 
100 jg ml“! cycloheximide, 1 mg ml“ heparin). Homogenates were spun down 
(9,400g for 10 min at 4°C) to remove particulates. Then, 40 il of homogenate was 
reserved for total RNA isolation (input control), and the remaining homogen- 
ate was incubated with 5 j1g haemagglutinin (HA) antibody (BioLegend) at 4°C 
under gentle rotation for 4-6 h. The supernatant was then added to 300 jul of 
rinsed Invitrogen Dynabeads Protein G magnetic beads (Thermo Fisher), and 
incubated overnight at 4°C under gentle rotation. The next day, bound beads 
were rinsed three times with 800 11 high-salt buffer (50 mM Tris-HCl, pH 7, 300 
mM KCI, 12 mM MgCl, 1% Nonidet P-40, 1 mM 1,4-dithiothreitol, 100 jg ml"! 
cycloheximide) at 4°C for 10 min, rotating. Buffer RLT (350 1l) from the RNeasy 
Plus Micro kit (Qiagen) was then added to the beads or reserved input sample, 
and vortexed for 30 s to release bound ribosomes and RNA. RNA was extracted 
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according to the manufacturer’s instructions for the RNeasy Plus Micro kit 
(Qiagen), using 16 11 of nuclease free water for elution as described previously”. 
This method yielded an average of 10.9 ng of immunoprecipitated RNA (average 
concentration = 0.68 ng jl!) and 185.6 ng of input RNA (average concentration = 
10.9 ng il!) for downstream analyses. All RNA samples used for RNA-seq had a 
minimum RNA integrity number (RIN) of 8. 

cello cochlear RNA extractions. For the cello RNA-seq, cochlear ducts from P8 
Tkzf2*!* and Ikzf2°"*l" mice were dissected and pooled (6 cochlear ducts per 
sample) to generate two biological replicates per genotype. For the NanoString 
validations, cochlear ducts from P8& Tkzf2eelto/ cello Tkzfocelo! + and Ikzf2t!+ mice 
were dissected and pooled (2-4 cochlear ducts per sample) to generate four 
biological replicates per genotype. RNA was extracted using the Direct-zol RNA 
MiniPrep kit (Zymo Research) following the manufacturer’s instructions. RNA 
quality and concentration were assessed using the Agilent RNA Pico kit (Agilent 
Technologies). All RNA samples used for RNA-seq had a minimum RNA integrity 
number (RIN) of 8. 

RNA-seq and normalization. RiboTag OHC RNA-seq libraries were prepared 
using the NEBNext Ultra Directional RNA Library Prep Kit for Illumina 
(New England Biolabs), and samples were sequenced in at least biological duplicates 
on a HiSeq 4000 system (Illumina) using a 75-bp paired end read configuration. 
P8 Ikzf2*/+ and Ikzf2°lello RNA libraries were prepared using the TruSeq RNA 
Sample Prep kit (Illumina), and samples were sequenced in biological duplicates 
on a HiSeq 2000 system (Illumina) and a 125-bp paired-end read configuration. 
Reads were aligned to the Mus musculus reference genome (assembly GRCm38.87 
(RiboTag) or GRCm38.84 (P8 cello)) using TopHat v.2.0.87°, and HTSeq was 
used to quantify the number of reads aligning to predicted coding regions””. See 
Supplementary Table 12 for alignment statistics. Expression levels were normalized 
using quantile normalization. In downstream analyses, only genes covered by at 
least 20 reads in a minimum of two samples from the same biological condition 
were considered as expressed. Significant differential gene expression between 
samples was assessed using DEseq’*. In addition to statistical significance between 
samples (FDR < 0.05), we also required a complete separation of expression levels 
between compared conditions for a gene to be called as differentially expressed. 
That is, for a gene to be called downregulated in condition A compared to condi- 
tion B, we required that all normalized expression levels measured in the samples of 
condition A to be lower than all normalized expression levels measured in the sam- 
ples of condition B. To avoid inflation of fold change estimates for lowly expressed 
genes, a floor level equal to the tenth percentile of the distribution of the expression 
levels was applied (that is, all expression values below the tenth percentile were set 
to the tenth percentile value). The OHC enrichment factors were calculated for 
each gene and time point by comparing the RiboTag immunoprecipitated samples 
to matched input samples, and are defined as the log, ratio of expression levels 
between the immunoprecipitated and input samples. Inspection of these enrich- 
ment factors revealed a systematic association to transcripts length (Supplementary 
Fig. 2a). Therefore, we used a locally weighted regression, implemented by the R 
lowess function, to remove this systematic effect (Supplementary Fig. 2b). 

Gene expression analyses. Genes with a changed level of expression in OHC 
immunoprecipitated samples at any time point relative to P8 were subjected to a 
clustering analysis using the CLICK algorithm, implemented in the EXPANDER 
package”*?°. Gene Ontology (GO) enrichment analysis was carried out using 
the EXPANDER implemented tool TANGO”. The adult mouse IHC and OHC 
transcriptomic dataset used for comparisons was generated previously® and can 
be accessed through the GEO database (accession number GSE111348)*. The 
expanded motif prediction analysis was performed using iRegulon'! through 
the Cytoscape visualization tool*". The analysis was performed on the putative 
regulatory region of 20 kb centred around the transcription start site using default 
settings. 

Immunohistochemistry. For cochlear sections, mice were euthanized by cervical 
dislocation and inner ears fixed in 4% paraformaldehyde (PFA) overnight at 4°C 
then decalcified in 4% EDTA in PBS. Ears were positioned in 4% low melting tem- 
perature agarose (Sigma-Aldrich) in upturned BEEM capsules (Agar Scientific) at 
a 45° diagonal angle, with the apex of the cochlea facing down and the vestibular 
system uppermost. Once set, the agarose block was removed from the BEEM 
capsule and 200 |1m sections were cut through the mid-modiolar plane of the coch- 
lea using a Leica VT1000S Vibratome. Sections were simultaneously permeabilized 
and blocked with 10% donkey serum (Sigma) in 0.3% Triton-X for 30 min at room 
temperature then labelled with primary antibodies for 3 h at room temperature. 
To enable detection, samples were incubated with fluorophore-coupled secondary 
antibodies for 2 h at room temperature then stained with 4’,6-diamidino-2- 
phenylindole dihydrochloride (DAPI; 1:2,500, Thermo Fisher) for 5 min. Sections 
were transferred to WillCo glass bottom dishes (Intracel) and visualized free-floating 
in PBS using a Zeiss 700 inverted confocal microscope (10-40 magnification). 
Primary antibodies: goat anti-Helios M-20 (1:400, Santa Cruz Biotechnology) and 
mouse anti-B-actin (1:500, Abcam). Secondary antibodies: Alexa Fluor 568 donkey 
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anti-goat (Invitrogen, 1:200) and Alexa Fluor 488 donkey anti-mouse (Invitrogen, 
1:200). 

For cochlear whole-mounts, mice were euthanized by cervical dislocation and 
inner ears fixed in 2% PFA for 30 min at 4°C. After fixation, ears were fine dissected 
to expose the sensory epithelium then immediately permeabilized in 0.2% Triton-X 
for 10 min and blocked with 10% donkey serum (Sigma) for 1 h at room temper- 
ature. Cochleae were immunolabelled with goat anti-Helios M-20 (1:400, Santa 
Cruz Biotechnology) overnight at 4°C then incubated with Alexa Fluor 568 donkey 
anti-goat secondary (1:200, Invitrogen) and the F-actin marker Alexa Fluor 488 
Phalloidin (1:200, Invitrogen) for 1 h at room temperature. Samples were washed 
with DAPI (1:2,500, Thermo Fisher) for 60 s to stain nuclei then mounted onto 
slides with SlowFade Gold (Life Technologies) and visualized using a Zeiss LSM 
710 fluorescence confocal microscope and 63 x oil magnification. 
Identification of the cello mutation. DNA was extracted from ear biopsies of 
affected G3; mice using the DNeasy Blood and Tissue Kit (Qiagen) and used for 
an initial genome-wide linkage study, using SNP markers polymorphic between 
the parental strains C57BL/6J and C3H.Pde6b-+ (Tepnel Life Sciences). Following 
linkage to a 21.57 Mb region on chromosome 1, additional SNP markers were 
identified and genotyped using standard PCR and restriction endonuclease 
protocols to delineate an 8.4 Mb critical interval between SNPs rs31869113 and 
1813475914. Subsequently, high-quality DNA was extracted from the tail of an 
affected G3 mouse using the Illustra Nucleon BACC2 Genomic DNA Extraction Kit 
(GE Healthcare) and sequenced by the Oxford Genomics Centre (Wellcome Trust 
Centre for Human Genetics) using the HiSeq system (Illumina). Sequencing reads 
were aligned to the mouse reference genome (assembly GRCm38) and known 
C57BL/6J and C3H.Pde6b+ SNPs were filtered out, leaving variants that were 
then given a quality score based on their sequencing read depth. Variants within 
the 8.4 Mb critical region which were deemed heterozygous, low-confidence (qual- 
ity score < 200), non-coding or synonymous were discounted. The putative Ikzf2 
lesion was amplified by standard PCR (see Supplementary Table 13 for genotyping 
primers) and validated by Sanger sequencing, using DNA from an affected G; ani- 
mal, as well an unaffected G; (control). Sequence gaps that spanned coding regions 
were amplified by PCR using DNA from an affected G3 mouse and analysed by 
Sanger sequencing. In all cases, sequence data were assessed for variation using 
DNASTAR Lasergene software (version 12.0.0). 

In silico analyses. Three independent online tools were used to predict the func- 
tional effect of the cello mutation in silico: Sorting Intolerant From Tolerant (SIFT); 
Polymorphism Phenotyping version 2 (PolyPhen-2); and Protein Variation Effect 
Analyser (PROVEAN)*?4. Structural 3D representations of wild-type and H517Q 
helios ZnF6 were predicted with RaptorX”», using peptide sequences as input, and 
visualized using py MOL software (version 1.7). 

In vitro analyses. A full-length Ikzf2* Helios construct was prepared using the 
pGEM-T Vector System II Kit (Promega) and used as a template for the genera- 
tion of an Ikzf2""” Helios construct with the QuikChange Lightning Site-Directed 
Mutagenesis Kit (Agilent Technologies). Plasmid DNA was prepared using the 
Wizard Plus SV Miniprep Purification System (Promega) and validated by Sanger 
sequencing. Sequence-verified Ikzf2* and Ikzf2‘"° constructs were subcloned 
in-frame into pCMV-Myc and pEGFP-C3 mammalian expression vectors (pro- 
vided by C. Esapa), to yield N-terminally tagged Ikzf2* and Ikzf2°"” Helios. See 
Supplementary Table 13 for cloning and mutagenesis oligonucleotide sequences. 

Constructs were subsequently used for subcellular localization studies using 
male Cercopithecus aethiops SV 40 transformed kidney cells (Cos-7) cells that had 
been seeded onto 22 x 22 mm glass coverslips in six-well plates, at a volume of 
1 x 10° cells per well. After 24h (or when 50-60% confluent), cells were transiently 
transfected with 1 jpg DNA of the Ikzf2+-Myc or Ikzf2°*!'°-Myc Helios construct 
using JetPEI DNA Transfection Reagent (Polyplus Transfection). At 24 h after 
transfection, cells were fixed in 4% PFA for 10 min and permeabilized with 1% 
Triton-X for 15 min at room temperature. After blocking in 10% donkey serum 
(Sigma) for 1 h at room temperature, cells were immunolabelled with goat anti- 
Helios M-20 primary antibody (1:600, Santa Cruz Biotechnology) overnight at 4°C, 
then incubated with Alexa Fluor 488 donkey anti-goat secondary antibody (1:200, 
Invitrogen) and F-actin marker Texas Red-X Phalloidin (1:200, Invitrogen) for 1h 
at room temperature. Cells were washed with DAPI (1:2,500, Thermo Fisher) for 
60 s. Coverslips were mounted onto slides with SlowFade Gold (Life Technologies) 
and cells were visualized using a Zeiss LSM 710 multiphoton fluorescence confocal 
microscope and 63 x oil magnification. 

Constructs were also used for co-immunoprecipitation studies using human 
embryonic kidney (HEK293T) cells that had been seeded directly onto six-well 
plates at a volume of 5 x 10° cells per well. Cells were transiently co-transfected 24h 
later with a total of 2 xg plasmid DNA to mimic the wild-type (1 jug Ikzf2t-Myc 
Helios + 1 jg Ikzf2*-GFP Helios), heterozygous (1 jg Ikzf2*-Myc Helios + 1 pg 
Tkzf2°""°-GEP Helios; 1 jug Ikzf2°"”°-Myc Helios + 1 jig Ikzf2+-GFP Helios) or 
homozygous (1 jug Ikzf2°"°-Myc Helios + 1 jug Ikzf2°!”-GFP Helios) states using 
JetPEI DNA Transfection Reagent (Polyplus Transfection). Single transfections 


with either 1 jug Ikzf2*-GFP Helios or 1 jug Ikz{2*-Myc Helios were also carried out 
for negative controls. Cells were lysed in 250 il of 1x RIPA buffer (150 mM NaCl, 
1% NP-40, 0.5% deoxycholate, 0.1% SDS, 50 mM Tris, pH 7.5, in milliQ water) 
48 h after transfection, then incubated with Protein G Sepharose Beads (Sigma) 
for 2 h at 4°C. The beads were pelleted by centrifugation and the supernatant incu- 
bated with either 1 1g of mouse anti-cMyc 9E10 antibody (Developmental Studies 
Hybridoma Bank) or 1-2 1g of custom-made rabbit anti-GFP antibody overnight 
at 4°C. The immunoprecipitation complexes were captured using Protein G beads, 
washed with RIPA buffer and released by incubation with NuPAGE Reducing 
Agent (Novex). Immunoprecipitation reactions and their corresponding reduced 
cell lysate were analysed by western blotting. Samples were electrophoresed on 
NuPage 4-12% Bis-Tris gels (Invitrogen) and transferred onto nitrocellulose mem- 
branes using the iBlot system (Invitrogen). Membranes were incubated with mouse 
anti-cMyc 9E10 antibody (1:5,000, Developmental Studies Hybridoma Bank) and 
custom-made rabbit anti-GFP (1:1,000, CUK-1819 MGU-GFP-FL) primary anti- 
bodies. Mouse 12G10 anti-c-tubulin (1:10,000, Developmental Studies Hybridoma 
Bank) was also used as a loading control. For detection, membranes were incubated 
with goat anti-mouse IRDye 680RD (1:15,000, LI-COR) and goat anti-rabbit IRDye 
800CW secondary antibodies (1:15,000, LI-COR) and imaged using the Odyssey 
CLx Infrared Imaging System (LI-COR). For quantification, band intensities were 
determined using the Image Studio Lite Ver 5.2 software and used to calculate the 
relative ratio of the co-immunoprecipitation to immunoprecipitation signal. Cos-7 
and HEK293T cell lines used in this study were provided by C. Esapa, were not 
authenticated, but were tested and confirmed to be free of mycoplasma contamina- 
tion. Cells were grown at 37°C under 5% CO) conditions in DMEM (Invitrogen) 
containing 10% heat-inactivated fetal bovine serum (FBS) (Invitrogen) and 
1x penicillin/streptomycin (Invitrogen). 

ABR. ABR tests were performed using a click stimulus in addition to frequency- 
specific tone-burst stimuli to screen mice for auditory phenotypes and investigate 
auditory function**. Mice were anaesthetized by intraperitoneal injection of keta- 
mine (100 mg ml! at 10% v/v) and xylazine (20 mg ml! at 5% v/v) administered 
at the rate of 0.1 ml per 10 g body mass. Animals were placed on a heated mat inside 
a sound-attenuated chamber (ETS Lindgren) and electrodes were placed subder- 
mally over the vertex (active), right mastoid (reference) and left mastoid (ground). 
ABR responses were collected, amplified and averaged using TDT System 3 (Tucker 
Davies Technology) in conjunction with either BioSig RP (version 4.4.11) or BioSig 
RZ (v5.7.1) software. The TDT system click ABR stimuli comprised clicks of 0.1 ms 
broadband noise spanning approximately 2-48 kHz, presented at a rate of 21.1"! 
with alternating polarity. Tone-burst stimuli were of 7 ms duration, inclusive of 
1 ms rise/fall gating using a Cos? filter, presented at a rate of 42.5 s_! and were 
measured at 8, 16 and 32 kHz. All stimuli were presented free-field to the right ear 
of the mouse, starting at 90 dB SPL and decreasing in 5 dB increments. Auditory 
thresholds were defined as the lowest dB SPL that produced a reproducible ABR 
trace pattern and were determined manually. All ABR waveform traces were viewed 
and re-scored by a second operator blinded to genotype. Animals were recovered 
using 0.1 ml of anaesthetic reversal agent atipamezole (Antisedan, 5 mg ml“! at 1% 
v/v), unless aged P16, when the procedure was performed terminally. 
Generation of Ikzf2%!®*? mice. The Ikzf2“*°*? mutant line was generated by the 
Molecular and Cellular Biology group at the Mary Lyon Centre, MRC Harwell 
Institute using CRISPR-Cas9 gene editing, as described previously” (see 
Supplementary Table 13 for single-guide RNA (sgRNA) sequences, donor oligonu- 
cleotide sequences and genotyping primers). For construction of each sgRNA plas- 
mid, a pair of single-stranded donor oligonucleotides (IDT) was hybridized and 
cloned using Gibson Assembly Master Mix (NEB) into linearized p_1.1 plasmid 
digested with Stul and AflII to express sgRNAs under the T7 promoter. 

The p_1.1_sgRNA plasmids were linearized with Xbal, purified with phenol- 
chloroform, and the products were used as templates from which sgRNAs were 
in vitro transcribed. sgRNAs were synthesized using MEGAshortscript T7 
Transcription Kit (Ambion). RNAs were purified using MEGAclear Transcription 
Clean-Up Kit (Ambion). RNA quality was assessed using a NanoDrop (Thermo 
Scientific) and by electrophoresis on 2% agarose gel containing Ethidium Bromide 
(Fisher Scientific). 

As this exon deletion mutant was generated as part of an experiment to generate 
a floxed mutant, a Ikzf2 flox long single-stranded DNA (IssDNA) donor was also 
synthesized as described previously for inclusion in the microinjection mix*®. 

For microinjections, the pronucleus of one-cell stage C57BL/6NTac embryos 
were injected with a mix containing Cas9 mRNA (5meC,W, Tebu-Bio/TriLink 
Biotechnologies) at 100 ng j1l~!, the four Ikzf2 sgRNAs, each at 50 ng il’ and the 
Ikzf2 flox ssDNA donor at 50 ng jl! prepared in microinjection buffer. Injected 
embryos were re-implanted in pseudo-pregnant CD-1 females, which were allowed 
to litter and rear Fo progeny. 

For genotyping, genomic DNA was extracted from ear biopsies of Fy and F; mice 
using DNA Extract All Reagents Kit (Applied Biosystems) and amplified by PCR 
using high fidelity Expand Long Range dNTPack (Roche) and specific genotyping 
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primers (see Supplementary Table 13). PCR products were further purified using 
QlAquick Gel Extraction Kit (Qiagen) and analysed by Sanger sequencing. Copy 
counting experiments by droplet digital PCR (ddPCR) against a known two copy 
reference (Dot1/) were also carried out to confirm the exon deletion and that there 
were no additional integrations of the lssDNA donor. Mice carrying the del890 
deletion allele were subsequently mated with mice carrying the cello mutation 
to generate Ikzf2°"/4el890 compound heterozygotes for complementation testing. 

Scanning electron microscopy. Mice were euthanized by cervical dislocation and 
inner ears were removed and fixed in 2.5% glutaraldehyde (TAAB Laboratories 
Equipment Ltd) in 0.1 M phosphate buffer for 4 h at 4°C. After decalcification in 
4.3% EDTA, cochleae were dissected to expose the organ of Corti, and subjected 
to ‘OTO’ processing (1 h incubation in 1% osmium tetroxide (TAAB Laboratories 
Equipment), 30 min incubation in 1% thiocarbohydrazide (Sigma), 1 h incubation 
in 1% osmium tetroxide), before dehydration in increasing concentrations of etha- 
nol (25%, 40%, 60%, 80%, 95%, 2 x 100%) at 4°C. Samples were critical point dried 
with liquid CO, using an Emitech K850 (EM Technologies), then mounted on 
stubs using silver paint (Agar Scientific) and sputter coated with platinum using a 
Quorum Q150R S sputter coater (Quorum Technologies). Samples were examined 
using a JEOL JSM-6010LV Scanning Electron Microscope. Hair cell bundle counts 
were performed by counting the number of OHC and IHC bundles adjacent to ten 
pillar cells in the apical (<180° from apex), mid (180-450° from apex) and basal 
(>450° from apex) regions of the cochlea. At least three ears (one ear per mouse) 
were analysed for each genotype at each time point. 

Electrophysiological analyses. Electrophysiological recordings were made from 
OHCs of cello mice aged P9-P 18. Cochleae were dissected in normal extracel- 
lular solution (in mM): 135 NaCl, 5.8 KCl, 1.3 CaCl, 0.9 MgCl), 0.7 NaH2POu, 
5.6 D-glucose, 10 HEPES-NaOH. Sodium pyruvate (2 mM), MEM amino acids 
solution (50x, without L-glutamine) and MEM vitamins solution (100 x ) were 
added from concentrates (Fisher Scientific). The pH was adjusted to 7.5 (osmo- 
lality approximately 308 mmol kg~'). The dissected cochleae were transferred to 
a microscope chamber, immobilized as previously described*’ and continuously 
perfused with a peristaltic pump using the above extracellular solution. The organs 
of Corti were viewed using an upright microscope (Nikon FN1) with Nomarski 
optics (60x objective). 

MET currents were elicited by stimulating the hair bundles of P9 OHCs in the 
excitatory and inhibitory direction using a fluid jet from a pipette (tip diameter 
8-10 tm) driven by a piezoelectric disc*’. The pipette tip of the fluid jet was 
positioned near to the bundles to elicit a maximal MET current. Mechanical stimuli 
were applied as 50 Hz sinusoids (filtered at 0.25 kHz, 8-pole Bessel) with driv- 
ing voltages of +40 V. MET currents were recorded with a patch pipette solution 
containing (in mM): 106 Cs-glutamate, 20 CsCl, 3 MgCl, 1 EGTA-CsOH, 5 
Na2ATP, 0.3 Na,xGTP, 5 HEPES-CsOH, 10 sodium phosphocreatine (pH 7.3). 
Membrane potentials were corrected for the liquid junction potential (-11 mV). 

Patch clamp recordings were performed using an Optopatch (Cairn Research) 
amplifier. Patch pipettes were made from soda glass capillaries (Harvard 
Apparatus) and had a typical resistance in extracellular solution of 2-3 MQ. To 
reduce the electrode capacitance, patch electrodes were coated with surf wax 
(Mr. Zog’s SexWax). Potassium current recordings were performed at room 
temperature (22-24°C) and the intracellular solution contained (in mM): 131 KCl, 3 
MgCl), 1 EGTA-KOH, 5 Na,ATP, 5 HEPES-KOH, 10 Naz-phosphocreatine (pH 7.3; 
osmolality approximately 296 mmol kg~!). Data acquisition was controlled by 
pClamp software (version 10) using Digidata 1440A boards (Molecular Devices). 
Recordings were low-pass filtered at 2.5 kHz (8-pole Bessel), sampled at 5 kHz and 
stored on computer for off-line analysis (Origin, OriginLab). Membrane potentials 
in voltage clamp were corrected for the voltage drop across the uncompensated 
residual series resistance and for a liquid junction potential (—4 mV). 

The presence of electromotile activity in P16-P18 OHCs was estimated by 
applying a depolarizing voltage step from the holding potential of —64 mV to 
+56 mV. Changes in cell length were viewed and recorded with a Nikon FN1 
microscope (75 magnification) with a Flash 4.0 SCCD camera (Hamamatsu). 
Cell body movement was tracked using Fiji software. Lines were drawn across the 
basal membrane of patched OHCs, perpendicular to the direction of cell motion, 
and a projected time-based z-stack of the pixels under the line was made. Cell 
movement was measured with Photoshop as a pixel shift and then converted to 
nanometres (290 pixels = 10 jum). 

Nonlinear (voltage-dependent) capacitance of IHCs in Anc80-Ikzf2-injected 
mice and their non-injected littermates was studied at P12-P16 using conventional 
whole-cell patch clamp recordings. Apical turn of the organ of Corti was carefully 
dissected in Leibovitz’s L-15 cell culture medium (21083027, Gibco/ThermoFisher) 
containing the following inorganic salts (in mM): 137 NaCl, 5.4 KCl, 1.26 CaCh, 
1.0 MgCh, 1.0 NayHPOg, 0.44 KH2PO, and 0.81 MgSO, and placed into a custom- 
made recording chamber, where it was held by two strands of dental floss. The 
organ of Corti explants were viewed with an upright microscope (BX51WIE, 
Olympus), equipped with a high numerical aperture (NA) objective (100x, 1.0 
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NA). To block voltage-gated ion channels in IHCs, the bath solution was made of 
L-15 medium supplemented with 10 mM tetraethylammonium-Cl, 2 mM CoCh, 
10 mM CsCl and 0.1 mM nifedipine (all from Sigma), while the intrapipette 
solution contained (in mM): 140 CsCl, 2.5 MgCh, 2.5 NazATP, 1.0 EGTA and 
5 HEPES. During recordings, the organs of Corti were continuously perfused with 
the above extracellular bath solution. Whole-cell current responses were recorded 
with MultiClamp 700B patch clamp amplifier (Molecular Devices), controlled by 
jClamp software (SciSoft). Membrane capacitance was measured during the voltage 
ramp with a dual sinusoidal, FFT-based method”. The recorded capacitance was 
fitted to the first derivative of a two-state Boltzmann function that is typically used 
to fit nonlinear capacitance of OHCs plus a small correction for the membrane area 
changes between expanded and contracted states of prestin"!, as follows: 

Cn=C, + Gin in which Cy is the total membrane capacitance, C, is a voltage- 
dependent (nonlinear) component, and Cin is a voltage-independent (linear) 
component. 


—ze(V-—V, 
G6. ze ob i AC i ep pk) 
kT (1+b) (l+b-) kT 


in which Qmax is the maximum nonlinear charge moved, Vj, is a voltage at 
peak capacitance, V is membrane potential, z is valence, e is electron charge, k 
is Boltzmann’s constant, T is absolute temperature, and AC,, is the maximum 
increase in capacitance that occurs when all prestin molecules change from com- 
pact to expanded state. To account for some variability in sizes of IHCs, statistical 
data are shown as the maximum of voltage-dependent component of capacitance 
(C,) normalized to the linear capacitance of the cell (Cy/Cin). 

DPOAEs. DPOAE tests were performed using frequency-specific tone-burst 
stimuli at 8, 16 and 32 kHz with the TDT RZ6 System 3 hardware and BioSig 
RZ (version 5.7.1) software (Tucker Davis Technology). An ER10B+ low noise 
probe microphone (Etymotic Research) was used to measure the DPOAE near 
the tympanic membrane. Tone stimuli were presented via separate MF1 (Tucker 
Davis Technology) speakers, with f, and f, at a ratio of f/f, = 1.2 (L1=65 dB SPL, 
L2=55 dB SPL), centred around the frequencies of 8, 16 and 32 kHz. Surgical 
anaesthesia was achieved by intraperitoneal injection of ketamine (100 mg ml“! 
at 10% v/v), xylazine (20 mg ml! at 5% v/v) and acepromazine (2 mg ml! at 8% 
v/v) administered at a rate of 0.1 ml per 10 g body mass. Once the required depth 
of anaesthesia was confirmed by the lack of the pedal reflex, a section of pinna was 
removed to allow unobstructed access to the external auditory meatus. Mice were 
then placed on a heated mat inside a sound-attenuated chamber (ETS-Lindgren) 
and the DPOAE probe assembly was inserted into the ear canal using a pipette tip 
to aid correct placement. In-ear calibration was performed before each test. The 
fi and f; tones were presented continuously and a fast-Fourier transform was per- 
formed on the averaged response of 356 epochs (each approximately 21 ms). The 
level of the 2f, — f: DPOAE response was recorded and the noise floor calculated 
by averaging the four frequency bins either side of the 2f, — f; frequency. 
NanoString validation. Cochlear RNA extracted from biological triplicates of 
Tkzfarlto/cello, Tkzfaellol+ and Ikzf2+/* animals at P8 were processed for NanoString 
validation at the UMSOM Institute for Genome Sciences using the nCounter 
Master Kit per manufacturer's instructions, and quantified using the NanoString 
nCounter platform. See Supplementary Table 13 for NanoString probe sequences. 
Data were analysed using nSolver 4.0 software (NanoString). 

Anc80L65 AAV vector construction. The Anc80L65-Myc-Ikzf2t (Anc80-Ikzf2) 
expression vector was designed to drive expression of a Myc-tagged Ikzf2 con- 
struct followed by a bovine Growth Hormone poly-adenylation (BGH pA) site 
under control of the cytomegalovirus (CMV) promoter. The Anc80L65-eGFP 
(Anc80-eGFP) expression construct also contained a Woodchuck Hepatatis 
Virus Posttranscriptional Regulatory Element (WPRE) preceding the BGH pA 
site. Anc80L65 AAV vectors!”!8 were produced by the Gene Transfer Vector Core, 
Grousbeck Gene Therapy Center at the Massachusetts Eye and Ear Infirmary 
(http://vector.meei.harvard.edu/). 

Inner ear gene delivery. For in vivo hair cell transductions, mice were injected 
with Anc80L65 AAVs between P1 and P3 via the posterior semicircular canal using 
the injection method described previously”. In brief, animals were anesthetized on 
ice before a post-auricular incision was made on either the left or right side. Tissues 
were further dissected to reveal the posterior semicircular canal, and a Nanolitre 
2010 microinjection system (World Precision Instruments) equipped with a loaded 
glass needle was used to inject 700 nl of 1.13 x 10’? genome copies (GC) per ml 
Anc80-Ikzf2 or 500 nl of 4.85 x 10!? GC per ml Anc80-eGFP. Injections into the 
inner ear were performed in 50 nl increments over the course of 2 min. The needle 
was then removed, the incision sutured, and animals were placed on a 37°C heating 
pad to recover before being returned to their cage. 

FACS. For the scRNA-seq analysis of Anc80-Ikzf2 transduced hair cells, inner ears 
of neonatal Myo15“'+;ROSA264G-4Tomato mice were injected with Anc80-Ikzf2 
(4 mice) or control Anc80-eGFP (2 mice) via the posterior semicircular canal. 
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Cochlear tissues from both injected and uninjected ears were obtained at P8 and 
further dissected to reveal the sensory epithelium. Inclusion of the uninjected 
ear in the single cell analysis allowed for the study of changes in gene expression 
that occur in response to a gradient of transgene expression. This is because, in 
mice, inner ear gene delivery often results in transduction in the contralateral ear, 
albeit at a lower intensity'®. Cochlear tissues were then dissociated for fluorescence 
activated cell sorting (FACS) using the method described previously’. In brief, the 
sensory epithelia from Anc80-eGFP- and Anc80-Ikzf2-injected mice were pooled 
separately into two wells of a 48-well plate containing 0.5 mg ml“! thermolysin 
(Sigma). Tissues were incubated at 37°C for 20 min, after which the thermolysin 
was removed and replaced with accutase enzyme (MilliporeSigma). After a 3-min 
incubation at 37 °C, tissues were mechanically disrupted using a 23G blunt ended 
needle connected to a 1 ml syringe. This step was performed twice. After confirm- 
ing tissue dissociation by direct visualization, the dissociation reaction was stopped 
by adding an equal volume of IMDM supplemented with 10% heat-inactivated FBS 
to the Accutase enzyme solution. Cells were passed through a 40 mm cell strainer 
(BD) to remove cell clumps. tdTomato-expressing hair cells were sorted into ice- 
cold tubes containing IMDM with 10% FBS on a BD FACSAria II (BD Biosciences) 
and processed for sCRNA-seq. Flow cytometry analyses were performed with assis- 
tance from X. Fan at the University of Maryland Marlene and Stewart Greenebaum 
Comprehensive Cancer Center Flow Cytometry Shared Service. 

scRNA-seq. tdTomato-positive sorted hair cells were pelleted once (300g at 
4°C) and resuspended in a minimal remaining volume (around 30 il). Hair cell- 
enriched single-cell suspensions were then used as input on the 10x Genomics 
Chromium platform with 3’ Single Cell v2 chemistry (10 Genomics). After cap- 
ture and library preparation, scRNA-seq libraries were sequenced on a NextSeq 
500 (Illumina) in collaboration with the NIDCD Genomics and Computational 
Biology Core. Samples were sequenced to an average depth of over 300,000 reads 
per cell, which resulted in detection of a median of >3,000 genes (Anc80-eGFP) 
and >4,000 genes (Anc80-Ikzf2) per cell, ensuring maximal transcriptional 
complexity and detection of low-abundance transcripts (see Extended Data 
Fig. 9b, c). Reads were aligned to a modified mm10 mouse reference containing 
the sequences for the Ail4 locus, as well as Anc80-eGFP and Anc80-Ikzf2 viral 
sequences (Extended Data Fig. 9a) using the 10x Genomics Cell Ranger (version 
2.0.2) package to generate the read counts matrix files. Read counts from viral 
and Ail4 loci were removed from the expression matrix before dimensionality 
reduction so as to not influence data clustering. Cells from these hair cell clusters 
were determined to be Anc80-Ikzf2(+) versus Anc80-Ikzf2(—), and IHCs 
versus OHCs, based on their expression of Anc80-Ikzf2 and Slc17a8, respectively 
(Fig. 3, Extended Data Figs. 8 and 9, Supplementary Table 9). Slc26a5 was not 
well detected in the scRNA-seq dataset and was therefore not used as an OHC 
marker. After clustering, four hair cells were excluded based on co-expression of a 
contaminating cell type. Secondary analyses, including shared nearest neighbour 
(SNN) clustering, t-SNE embedding, and differential expression testing (using 
either Wilcoxon ranked sum for marker gene identification or MAST for pairwise 
comparison between control IHCs and OHCs) were performed in R with Seurat 
(version 2.1.0)**, Non-parametric analysis of variance between the four classified 
groups of HCs (IHCs and OHCs with either high or low Anc80-Ikzf2 expres- 
sion) using a Kruskal-Wallis test was performed to help qualify genes that had 
statistical difference across these cell populations. This was followed by post hoc 
pairwise Wilcoxon ranked sum comparisons to assess multiple-comparison- 
adjusted P values. Additional plots were generated by NMF (version 0.20.6) and 
ggplot2 (version 2.2.1)*>*°. These analyses used the computational resources of 
the NIH HPC Biowulf cluster (http://hpc.nih.gov). 

Immunohistochemistry of AAV-injected cochleae. Mouse inner ears injected 
with either Anc80-Ikzf2 or Anc80-eGFP were between P8 and 8 weeks, fixed in 
4% PFA in PBS overnight at 4°C, and decalcified in a solution of 5% EDTA in 
RNAlater (Invitrogen). Decalcified ears were processed by sucrose gradient and 
embedded in OCT compound (Tissue-Tek) for cryosectioning, or fine dissected for 
whole-mount immunohistochemistry. Cryosections (10 1m) on positively charged 
glass slides were used for in situ hybridization (ISH) and section immunohisto- 
chemistry. For whole-mount immunolabelling at 6-8 weeks, hair cell loss was 
observed in the injected ear and therefore the contralateral ear, expressing a lower 
level of the Anc80-Ikzf2 virus, was used. Primary antibodies: goat anti-prestin 
N-20 (1:200, Santa Cruz Biotechnology); goat anti-oncomodulin N-19 (1:100, 
Santa Cruz Biotechnology); rabbit anti-myosinVI (1:1,000, Proteus BioSciences); 
rabbit anti-GFP (1:100, Life Technologies); mouse anti-cMyc 9E10 (1:100, Santa 
Cruz Biotechnology) and mouse anti-otoferlin (1:100, Abcam). The guinea pig 
anti- VGLUT3 antibody (1:5,000) used in this study was donated by R. Seal. 
Corresponding Alexa Fluor 488 and 546 (1:800, Invitrogen) were used for secondary 
detection, Alexa Fluor 488 Phalloidin (1:1,000, Invitrogen) was used to mark 
F-actin, and DAPI (1:20,000, Thermo Fisher) was used to mark cell nuclei. Images 
were acquired using a Nikon Eclipse E600 microscope (Nikon) equipped with a 
Lumenera Infinity 3 camera. Whole-mount images were acquired using a Zeiss 


LSM DUO confocal microscope, located at the UMSOM Confocal Microscopy 
Core, at 63 x oil magnification. Images were processed using Infinity Capture and 
Infinity Analyze software (Lumenera), and Image] software. 

RNA in situ hybridization. In situ hybridization was performed as described 
previously“”, In brief, slides were re-fixed in 4% PFA, and then treated with 2 jig 
ml“! Proteinase-K for 10 min. Proteinase K reaction was stopped by soaking 
slides again in 4% PFA, followed by acetylation and permeabilization. 
Hybridization for the digoxigenin labelled Fcrlb probe was performed overnight 
at 65°C (see Supplementary Table 13 for Ferlb probe primers). After a series of 
washes in saline sodium citrate, slides were incubated with sheep-anti-digoxigenin 
antibody conjugated to alkaline phosphatase (Sigma-Aldrich, 1:100) overnight at 
4°C. Slides were then incubated in BM purple AP substrate precipitating solution 
(Roche) to localize bound anti-digoxigenin antibody. 

Reporting summary. Further information on research design is available in 
the Nature Research Reporting Summary linked to this paper. 


Data availability 

The RiboTag OHC RNA-seq, P8 cello cochlea RNA-seq, and P8 Anc80-Ikzf2 and 
Anc80-eGFP injected cochlea scCRNA-seq data have been submitted to the Gene 
Expression Omnibus (GEO) database under accessions GSE116703, GSE116702 
and GSE120462, and are also available for viewing through the gEAR portal 
(https://umgear.org/). 
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Extended Data Fig. 1 | RiboTag immunoprecipitation enriches 

for known OHC-expressed transcripts. a, Representative 

prestin®ERT2/+ ROSA 26CAG-tdTomato cochlear whole-mount. The 
prestin®=!?_driven tdTomato expression is OHC-specific at P21 (n= 1). 
Scale bar, 20 jum. b, Schematic of the RiboTag immunoprecipitation 
protocol. Red OHCs represent Cre/HA-tagged ribosome expression. 

c, RiboTag RNA-seq log, enrichment and depletion of transcripts for 
known inner ear cell type markers (EF = log,(IP/input)). d, Genes at least 
two-fold enriched in IHCs (n =565 genes) or OHCs (n = 253 genes) in 
the published dataset® are significantly depleted or enriched, respectively, 
by the RiboTag OHC immunoprecipitation at all time points examined 
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Extended Data Fig. 3 | The Ikzf2" mutation disrupts 
homodimerization of Helios. a, Cos-7 cells transfected with Tkzf2* - or 
Tkzf2°l!°-Myc, Nuclear localization is unaffected by the Ikzf2°"? mutation. 
n=2 biologically independent experiments. Scale bars, 10 1m. 

b, Co-immunoprecipitation (IP) of Myc-tagged (62 kDa) and GFP-tagged 
(88 kDa) Ikzf2* and Ikzf2°" constructs. Transfected cell lysates were 
immunoprecipitated using an anti-Myc antibody and analysed by 
western blotting with both anti-Myc and anti-GFP antibodies. Results 
show that wild-type Ikzf2* Helios can dimerize, but that dimerization 

is impaired by the cello mutation. LC, cell lysate loading control. 

c, Reciprocal immunoprecipitation reactions using an anti-GFP antibody 
confirm dimerization of wild-type Ikzf2* Helios and reduced dimerization 
of mutant Ikzf2"”’ Helios. d, Quantification of co-immunoprecipitation 
western blots. Band intensities were determined and used to calculate 


the relative ratio of the co-immunoprecipitation to immunoprecipitation 
signal. n = 4 biologically independent experiments. Data are mean + s.e.m. 
Anti-Myc IP: ***P < 0.0001 (Ikzf2*-Myc + Ikzf2*-GFP vs Ikzf2*-Myc 

+ Ikzf2°!".GEP, vs Ikzf2°l""-Myc + Ikzf2*-GFP and vs Ikzf2°" Myc 

+ Ikzf2°!"".GEP). *P = 0.0476 (Ikzf2°""-Myc + Ikzf2+-GEP vs Ikzf2°el?- 
Myc + Ikzf2°"°-GFP). P= 0.1488 (Ikzf2*-Myc + Ikzf2‘!""-GEP vs 
Tkzf2""°_Myc + Ikzf2+-GEP). P= 0.9020 (Ikzf2*-Myc + Ikzf2°"°-GFP vs 
Tkzf2°""°-Myc + Ikzf2°!"°-GFP). Anti-GFP IP: ***P < 0.0001 (Ikzf2*-Myc 
+ Ikzf2*-GEP vs Ikzf2°"°-Myc + Ikzf2+-GFP, vs Ikzf2+-Myc + Ikzf2°!?- 
GFP and vs Ikzf2°!!-Myc + Ikzf2‘!!"-GFP). *P = 0.0202 (Ikzf2°!"-Myc + 
Ikzf2+-GEP vs Ikzf2*-Myc + Ikzf2°"°-GFP) *P = 0.0346 (Ikzf2°!""-Myc + 
Ikzf2*-GFP vs Ikaf2°""-Myc + Ikzf2°!"".GFP). P= 0.9894 (Ikzf2+-Myc + 
Tkzf2°""°_GEP vs + Ikzf2°"°-Myc + Ikzf2°!"°-GFP) (one-way ANOVA with 
Tukey post hoc test). See Supplementary Fig. 1 for source images. 
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Extended Data Fig. 4 | Auditory function and HC bundle survival in 
cello mice. a, Representative click ABR waveforms for Ikzf2*!*, Tkzf2celto! + 
and Ikzf2°!'oello littermates at P16. n = 4 biologically independent animals 
per genotype. b, c, Averaged ABR thresholds for cello mice at 1-month 

of age (b) and 9 months of age (c). Age-matched Ikzf2*/*+ and Ikzf2°l/+ 
controls display thresholds within the expected range (15-30 dB SPL) at all 
time points tested. n = 5 biologically independent animals per genotype. 
Data are mean thresholds + s.e.m. 1-month Ikzf2°el!e/llo vg Ikzfa+!+: 

*#*P < 0.0001 (8 kHz, 16 kHz, 32 kHz, click). 1-month Ikzf2°l/le vs 

1 Ikzf2cllol+; #* P< 0.0001 (8 kHz, 16 kHz, 32 kHz, click). 9-month 


Tkzfacello/cello vg Tkzfat!+, #***P < 0,0001 (8 kHz, 16 kHz, 32 kHz, click). 
9-month Ikzf2°l!o/llo vs Tkzf2cellel +, P< 0.0001 (8 kHz, 16 kHz, 

32 kHz, click) (one-way ANOVA with Tukey post hoc test). d, OHC and 
IHC bundle counts for cello mice from P16 to 18 months of age. Grey, 
Ikzf2t'*; black, Tkzf2cellor *; red, Tkzf2cellor cello Data are mean +s.e.m. N.S., 
non-significant. *P< 0.05, **P<0.01, ***P< 0.001, ****P< 0.0001 
(one-way ANOVA with Tukey post hoc test). Number of biologically 
independent samples for OHC and IHC bundle counts are shown. See also 
Supplementary Table 5 and 6. 
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Extended Data Fig. 5 | Scanning electron microscopy of cello mice 
and auditory function of Ikzf2‘"'/4-!89 compound heterozygotes. 

a, Scanning electron micrographs of the organ of Corti of cello mice from 
P16 to 18 months of age. Representative images from the mid-region of 
the cochlear spiral are shown. Scale bars, 10 jum. n=3 (P16 Tkzf2ceto! ame 
P16 Tkzfocetorcello, 1-m Tkzfgcello/cello, 9-m Ikzf2t'*, 18-m Ikzf2t'*, 

18-m Ikzf2%"'+, 18-m Ikzf2eele/lloy, 4 — 4 (P16 Ikzf2*!*, 1-m Ikzf2t'*, 
3-m Ikzf2*!*, 3-m Tkzf2celo/cello, 6-m Ikzf2*'*, 6-m Tkzf2eello/cello, 

9-m Tkzfaeelor+, 9-m Tkzfacelo/celtoy andn=5 (1-m Tkzf2eeltol+, 3-m Tkzf2eeto, 
6-m Ikzf2°"*!+) biologically independent samples. b-d, Scanning electron 
micrographs of OHC stereocilia bundles of cello mice at P16, showing 
that wild-type Ikzf2+/+ (b), Ikzf2°l* (c) and mutant Ikzf2‘ele/lle (d) 

mice display overall expected bundle patterning. Images are from the 
mid-region of the cochlear spiral. Scale bars, 1 zm. n = 3 biologically 
independent samples for each genotype. e, The genomic and domain 
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structure of Ikzf24'8, Black, 5’ untranslated region; light grey, N-terminal 
DNA-binding domain; dark grey, C-terminal dimerization domain. 

The Ikzf2°"” mutation lies in ZnF6. The del890 mutation deletes exon 

4 and the surrounding intronic sequence. f, Averaged ABR thresholds 
for Ikzf2°"2/4!890 compound heterozygotes at 1 month of age, showing 
increased thresholds (>40 dB SPL) at all frequencies tested compared 
to Ikzf2*!'*, Tkzf2°"°'+ and Ikzf24'*°’+ control colony mates. Data 

are mean +s.e.m. n=4 (Ikzf2t!+, Ikzf2?/4!8), n = 2 (Ikzf2°"’*) and 
n=5 (Ikzf2%le/4!89) biologically independent samples. Ikzf2°l!/4!890 

vs Ikzf2*!+: *P=0.011 (8 kHz), **P =0.002 (16 kHz), ****P < 0.0001 
(32 kHz), ***P = 0.0001 (click); Ikzf2°le/4°!8% vs Tkzfaellel+, P— 0.078 

(8 kHz), *P = 0.034 (16 kHz), **P=0.001 (32 kHz), **P=0.001 (click); 
Tkzf20ello/ael890 ve Tkzf2t/4el890. * Pp — 0.025 (8 kHz), **P =0.009 (16 kHz), 
*** P — 0.0002 (32 kHz), ***P=0.0002 (click) (one-way ANOVA with 
Tukey post hoc test). 
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Extended Data Fig. 6 | See next page for caption. 
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Extended Data Fig. 6 | The MET and adult-like potassium currents are 
normal in Tkzf2eele mice. a, b, MET currents were recorded from OHCs 
of P9 Ikzfacello/cello and Ikzf2¢l'e/+ (control) littermates. During voltage 
steps, hair bundles were displaced by applying a 50-Hz sinusoidal force 
stimuli (the driver voltage to the fluid jet is shown above the traces)*?. At 
hyperpolarized membrane potentials (—121 mV), saturating excitatory 
bundle stimulation (that is, towards the taller stereocilia) elicited a large 
inward MET current from both Ikzf2°"/* and Ikzf2°!!*l!° OHCs, whereas 
inhibitory bundle stimulation (that is, away from the taller stereocilia) 
closed the MET channels and reduced the resting current. Because the 
MET current reverses near 0 mV, it became outward when excitatory 
bundle stimulation was applied during voltage steps positive to its 
reversal potential. At positive membrane potentials (+99 mV), excitatory 
bundle stimulation now elicited similar outward MET currents with 
larger resting amplitudes. Arrows indicate closure of the MET channels 
(that is, disappearance of the resting current) during inhibitory bundle 
displacements, arrowheads indicate the larger resting MET current at 
+99 mV compared to —121 mV. c, Peak-to-peak current-voltage curves 
obtained from Ikzf2"'+ (n = 10 biologically independent samples) 

and Ikzf2°l!/elle (4 — 8 biologically independent samples) OHCs at P9. 


The maximal MET current and the resting open probability of the MET 
channel were found to be similar between the two genotypes. Data are 
mean +s.e.m. d, e, Total K* currents recorded from P18 Tkzfocello! + 

control (d) and Tkzf2cellor cello mutant (e) OHCs. The size of the K* current, 
which is mainly due to the negatively activated Ix, (in addition to a small 
delayed rectifier Ix!5), was smaller in Ikzf2°"/*!"° OHCs. f, Average peak 
current-voltage relationship for the total K* current recorded from the 
OHCs of Ikzf2°""*/+ (n= 9 OHCs from 6 biologically independent animals) 
and Ikzf2°!!/llo (4 —7 OHCs from 5 biologically independent animals) 
mice at P16-P18. Data are mean +s.e.m. g, h, After normalization to 

the significantly reduced surface area of Ikzf2°!"/“""@ QHCs (for this set 

of experiments: Ikzf2°"°/+; 14.2 +.0.4 pF; Ikzf2ele/ll; 11.2 + 0.5 pF; 
P<0.0005), both the total Ix (g) and isolated I, (h) were not significantly 
different between the two genotypes at P16-P18 (two-sided Welch's t-test). 
Data are mean ts.e.m. i, NanoString validations of genes downregulated 
in P8 Ikzf2¢"/ello cochleae at P16, normalized to wild-type reads. Data 

are mean + s.d. (n= 4 biologically independent samples per genotype). 

*P = 0.038 (Ppp17r1 in Tkzf2°"/l” vs Tkzf2*!*), *P =0.037 (Ppp17r1 in 
Tkzf2elo/cello vs Tkzf2°"'+) (two-sided Welch’s t-test). 
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Extended Data Fig. 7 | Transduction of cochlear hair cells using 
Anc80L65 and hair cell enrichment by flow cytometry. a, Schematic 
representation of inner ear viral gene delivery via the posterior 
semicircular canal of CD-1 mice for hair cell marker immunolabelling. 
b, Immunolabelling for GFP in the Anc80-eGFP injected, and MYC in 
the Anc80-Ikzf2 injected ears, showing mainly hair cell transduction, 
although some MYC staining could also be observed in supporting cells 
(blue arrow). n= 3 biologically independent samples per condition. 
Nuclear MYC staining suggests proper trafficking of the MYC-tagged 
Helios protein in transduced cells. White arrows indicate OHCs, white 
arrowheads indicate IHCs. Scale bars, 10 jm. c, d, Flow cytometry of 
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dissociated cochlear GFP-positive and tdTomato-positive cells from P8 
Myo15°!+;ROSA26C4G !47omato mice injected with either Anc80-eGFP 

(c, 2 mice) or Anc80-Ikzf2 (d, 4 mice). Cells were first gated by forward 
and side scatter to exclude doublets. For the Anc80-eGFP-transduced 
cochlear sample, transduced cells were identified based on GFP 
expression, and hair cells were further identified by tdTomato expression. 
tdTomato single-positive, GFP single-positive and tdTomato and GFP 
double-positive cells were collected. For the Anc80-Ikzf2-transduced 
cochlear sample, hair cells were gated based on tdTomato single-positive 
expression and collected. 
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Extended Data Fig. 8 | See next page for caption. 
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Extended Data Fig. 8 | Transcriptional conversion of Anc80-Ikzf2- 
transduced IHCs. a, Heat map for the top 30 differently expressed genes 
between all hair cells profiled. Scaled expression values shown as z-scores, 
with yellow indicating higher and purple indicating lower expression than 
the mean. b, OHC enriched genes that are induced in Anc80-Ikzf2(+) 
IHCs. Anc80-Ikzf2(—) IHC (n = 34) vs Anc80-Ikzf2(+) IHC (n= 40) 
FDR: Pde6d = 2.03 x 10~?, Ldhb = 3.74 x 10-1". Dots represent the 
expression values of individual cells, with width of violins summarizing 
overall relative distribution of expression. c, IHC enriched genes that are 
highly expressed in control IHCs vs control OHCs, but are significantly 
reduced in Anc80-Ikzf2(+) IHCs. Anc80-Ikzf2(—) IHC (n= 34) vs Anc80- 
Ikzf2(+) IHC (n= 40) FDR: Fef8 = 3.30 x 1074, Atp2a3 = 2.46 x 107°, 
Rprm=2.27 x 10-8 (Kruskal-Wallis test followed by post hoc pairwise 
Wilcoxon ranked sum test adjusted for multiple comparisons). 

d, IHC-enriched genes that show only moderately reduced expression 

in Anc80-Ikzf2(+) IHCs. Anc80-Ikzf2(—) IHC (n = 34) vs Anc80- 
Ikzf2(+) IHC (n= 40) FDR: Shtn1 =8.59 x 10-5, Tbx2 =3.88 x 10-8, 
Cabp2= 1.40 x 10~'° (Kruskal-Wallis test followed by post hoc pairwise 
Wilcoxon ranked sum adjusted for multiple comparisons). e, f, Top 20 
genes negatively (e) or positively (f) correlated with Ikzf2 expression in 
control hair cells, shown alongside corresponding correlations of gene 


LETTER 


expression within all Anc80-Ikzf2-transduced hair cells, Anc80-Ikzf2- 
transduced IHCs, or Anc80-Ikzf2 transduced-OHCs. See also Extended 
Data Fig. 9. g, Genes that are negatively correlated with Ikzf2 (n = 20, 
Pearson correlation < —0.6) are not enriched in OHCs at P8 compared 

to all other genes detected in the RiboTag OHC dataset (background 
genes, n = 13,124). Genes that are positively correlated with Ikzf2 

(n=41, Pearson correlation > 0.6) are significantly enriched in OHCs 

at P8 compared to background genes (” = 13,103) (P=0.025, two-sided 
Wilcoxon's test). Black line represents median enrichment factor (log> fold 
change), box demarcates first and third quartiles, whiskers demarcate first 
and third quartile + 1.5 x IQR values, dots represent single outliers. 

h, One of the most differentially expressed genes observed in our 
scRNA-seq experiment was Fcrib, a gene which encodes an Fc receptor like 
protein, and the expression of which has not been previously described 

in the ear. Ferlb is significantly downregulated in Anc80-Ikzf2(+) hair 
cells. Anc80-Ikzf2(—) IHC (n = 34) vs Anc80-Ikzf2(+) IHC (n= 40) 
FDR=4.89 x 107°. Anc80-Ikzf2(—) OHC (n= 132) vs Anc80-Ikzf2(+) 
OHC (n= 148) FDR =6.88 x 10—8 (Kruskal-Wallis test followed by 

post hoc pairwise Wilcoxon ranked sum test adjusted for multiple 
comparisons). See also Supplementary Tables 8-11. 
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Extended Data Fig. 9 | See next page for caption. 
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Extended Data Fig. 9 | sCRNA-seq allows for high-resolution 
discrimination of cell types and their transcriptional changes due to 
overexpression of Ikzf2. a, Custom annotation strategy with theoretical 
reads mapping to unambiguous regions of the various custom viral loci, as 
well as those regions that get discarded because of endogenous sequence 
similarity (that is, ambiguous reads). b, Violin plots of the overall scCRNA- 
seq detection metrics, including number of unique molecules detected in 
each of the major cell type cluster identified (low Anc80-Ikzf2 expressing 
IHCs: viral Ikzf2 (vIk)~ IHCs n = 34; low Anc80-Ikzf2 expressing OHCs: 
vIk~ OHCs n= 132; high Anc80-Ikzf2 expressing IHCs: vIk* IHCs n= 40; 
high Anc80-Ikzf2 expressing OHCs: vIk* OHCs n= 140; and non-HCs: 
n=219). c, FeaturePlots with red showing higher expression across all 
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profiled cells, including cells identified as non-hair cells. Expression 

from loci captured with custom annotation shown to support cluster 
identification. A final labelled t-SNE plot shows all cells profiled clustered 
by predicted cell type. Misc, cells from all miscellaneous clusters with 
fewer than 5 cells; NSC, non-sensory epithelial cell; SC, organ of Corti 
supporting cell. Other clusters are defined by the highest differentially 
expressed marker gene. d, Pearson correlation scatter plots for selected 
genes within all profiled hair cells, hair cells from the Anc80-eGFP sample, 
or IHCs from the Anc80-Ikzf2 sample. e, A Pearson correlation heat 

map of all hair cells detected showing overall transcriptional similarities 
between the non-transduced IHCs and OHCs, along with the Anc80- 
Ikzf2-transduced IHCs and OHCs. 
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Extended Data Fig. 10 | See next page for caption. 
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Extended Data Fig. 10 | Ikzf2 overexpression induces prestin 
expression and electromotility in IHCs but does not affect hair 

bundle morphology. a, The OHC electromotility protein prestin is 
expressed in the OHCs of Ikzf2°"/"«l? mutants (n = 6 biologically 
independent samples). In addition, the pattern of prestin expression is 
not affected by Anc80-eGFP transduction, but is induced in Anc80-Ikzf2- 
transduced IHCs (n = 3 biologically independent samples per condition). 
Scale bars, 10 jum. b, Expression of prestin can be seen in Anc80-Ikzf2- 
transduced IHCs as early as P8 and up to 8 weeks of age, and overlaps 
with MYC staining (n = 6 biologically independent samples at P8, n =3 
biologically independent samples at 6-8 weeks). Scale bars, 20 jim. 

c, Scanning electron micrographs of IHC and OHC stereocilia bundles of 
Anc80-Ikzf2- and Anc80-eGFP-injected mice at P23 showing expected 
bundle patterning. Images are from the mid-basal region of the cochlear 
spiral. Scale bars, 1 sm. Number of biologically independent samples 
(P16-P23): Anc80-Ikzf2-injected cochlea n= 8, Anc80-Ikzf2 contralateral 
cochlea n= 6, Anc80-eGFP-injected cochlea n = 3. d, Representative 
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traces of the voltage-dependent (nonlinear) component of the membrane 
capacitance (an electrical signature of electromotility) in the IHCs 

of Anc80-Ikzf2-injected mouse (red) and its non-injected littermate 
(grey). Mice were injected with Anc80-Ikzf2 at P2 and recorded at P16. 

e, Normalized maximal nonlinear capacitance in all recorded IHCs of 
mice injected with Anc80-Ikzf2 at P2 (red) at different ages after injection 
and their non-injected littermates (black). Each symbol represents one 
biologically independent cell, and the total number of cells is indicated 

in parentheses. Because Anc80-Ikzf2 transduction is not 100% efficient 

in the apical turn of the cochlea at the time points tested, some IHCs of 
Anc80-Ikzf2-injected mice do not show prominent nonlinear capacitance, 
whereas the other IHCs do. In the IHCs with maximal nonlinear 
capacitance of more than 0.25 pF (due to presumable Ikzf2 expression), 
the parameters of the Boltzmann fit were as following (mean + s.e.m.): 
Qmax=0.10+ 0.02 pC; Vix=—31 1 mV; z=0.91 £0.02; Chin= 11.7+ 1.2 pF; 
AC,, = 0.14 £0.07 pF (n = 12). For information on the fitting procedure, 
see Methods. 
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Linking a cell-division gene and a suicide gene to 
define and improve cell therapy safety 


Qin Liang!*®, Claudio Monetti)®, Maria V. Shutova!, Eric J. Neely!*, Sabiha Hacibekiroglu!’, Huijuan Yang!*, Christopher Kim!?, 
Puzheng Zhang’, Chengjin Li!, Kristina Nagy)’, Maria Mileikovsky!, Istvan Gyongy*, Hoon-Ki Sung! & Andras Nagy)?®7* 


Human pluripotent cell lines hold enormous promise for the 
development of cell-based therapies. Safety, however, is a crucial 
prerequisite condition for clinical applications. Numerous groups 
have attempted to eliminate potentially harmful cells through the use 
of suicide genes!, but none has quantitatively defined the safety level 
of transplant therapies. Here, using genome-engineering strategies, 
we demonstrate the protection of a suicide system from inactivation 
in dividing cells. We created a transcriptional link between the 
suicide gene herpes simplex virus thymidine kinase (HSV-TK) and 
a cell-division gene (CDK1); this combination is designated the safe- 
cell system. Furthermore, we used a mathematical model to quantify 
the safety level of the cell therapy as a function of the number of cells 
that is needed for the therapy and the type of genome editing that is 
performed. Even with the highly conservative estimates described 
here, we anticipate that our solution will rapidly accelerate the entry 
of cell-based medicine into the clinic. 

Most randomly integrated transgenes show variegated expression” *. 
To achieve reliable expression, we generated a transcriptional link 
between a cell-division locus (CDL), which is a gene that is essential for 
a cell to divide or to survive, and a drug-inducible suicide system (SU), 
which resulted in a CDL-SU allele (Fig. 1a). Therefore, the expression 
of the CDL and the drug-inducible suicide system are tightly linked 
and, if required, either only the dividing cells or all of the cells can 
be arrested or eliminated by treatment with the drug that induces the 
suicide system. 

We consider a cell population a safe-cell batch when all cells within 
that batch contain a functional suicide system. The safe-cell level (SCL) 
is the number of therapeutic batches in which there is expected to be 
only one non-safe batch (Fig. 1b), such that one in a thousand gives 
a SCL of 1,000 and one ina million has a SCL of 1,000,000. We used 
various in vitro and in vivo experiments as well as mathematical mod- 
elling to define the SCL as the function of the number of cells needed 
for therapy. 

From a list of CDL candidates (Supplementary Table 1), we chose 
CDK1 and HSV-TK.007° (TK) as prototypes. The absence of CDK1 
causes a block in the G2 to M transition of the cell cycle, and other 
CDKsare not able to rescue this deficiency**. TK has been extensively 
used for cell ablation? and its mechanism of action in the presence of its 
clinically approved prodrug ganciclovir (GCV), is well-characterized"®. 
CDK1 is not expressed in non-dividing cells®. Therefore, TK is not 
expressed from the CDK1-TK allele, eliminating the potential effect 
of its immunogenicity’. 

To generate a transcriptional link between CDK] and TK (Fig. 1c), 
we inserted TK into the 3’ untranslated region of Cdk1 in mouse C2 
embryonic stem (ES) cells!” (Extended Data Fig. la-c) and of CDK1 in 
human H1° (Extended Data Fig. 2a—-f) and human CA1 14 (Extended 
Data Fig. 3a—e) ES cells. We determined the optimal GCV concentra- 
tion for the heterozygous Cdk1-TK-expressing mouse and heterozy- 
gous CDK1-TK-expressing human ES cells in vitro (Extended Data 


Figs. 1d, 2h) as well as for controlling the growth of the teratomas 
that were generated using these lines. Human ES cells implanted in 
immunodeficient NOD/SCID/IL2Ry (NSG) mice and mouse ES cells 
implanted in isogenic C57BL/6N recipient mice resulted in teratoma 
formation with the expected efficiency (Extended Data Figs. le, 2). 
Ata volume of 500 mm? (day 0), we administered GCV daily by intra- 
peritoneal injection for up to four weeks. GCV rendered the C2 ES-cell- 
derived teratomas dormant, without growth rebound following the 
treatment (Fig. 1d and Extended Data Fig. 4a, b). H1-derived teratomas 
responded similarly although occasionally, repeated GCV treatments 
were required to stabilize the teratoma size (Fig. 1fand Extended Data 
Fig. 4d). The decrease in teratoma size after GCV readministration 
indicates regained proliferation of quiescent or slow-dividing cells fol- 
lowing drug withdrawal. Consequently, TK is expressed in dividing 
cells and induces subsequent GCV sensitivity. The volume of the 
human teratomas frequently increased in the later phases; however, in 
agreement with previous reports'>!°, this was the result of cyst forma- 
tion (Extended Data Fig. 4e) and not solid tissue growth. 

The induced long-term dormancy of teratomas was encouraging, but 
was also unexpected given that such a large tissue (approximately 10° 
cells!”) could contain numerous cells that might be capable of escaping 
the suicide system through different types of mutations. Within the 
well-encapsulated teratoma, however, these presumably resistant cells 
(escapees) could have been eliminated by the bystander-killing effect 
of the TK-GCV system'®. 

To further investigate the capacity of the safe-cell system to control 
cell proliferation, we performed a breast cancer transplantation assay 
using safe-cell mammary epithelial tumour cells!®. Upon isogenic trans- 
plantation, we observed that after a delayed period (approximately 100 
days), heterozygous safe-cell tumours became resistant to GCV and 
they continued to grow in the presence of the drug (Extended Data 
Fig. 5a, b). 

To identify escapees that appeared during the expansion of hete- 
rozygous safe-cell ES cells, we designed an in vitro experiment that 
mitigated the bystander-killing effect (Extended Data Fig. 6a, b) and 
characterized the mechanisms by which resistance occurred in eight 
independent clonal escapee lines obtained from 120 million cells. To 
determine whether the mechanism by which resistance occurred was 
caused by large genomic changes or Cdk1-TK locus-specific muta- 
tions, we analysed the copy number of Cdk1, the TK transgene and six 
endogenous genes that are found on chromosome 10 (Extended Data 
Fig. 6d, f). Only one escapee (E3 in Extended Data Fig. 6f) contained 
the TK gene (Extended Data Fig. 6c). We did not detect mutations in 
the coding region of either Cdk1 or TK (data not shown); however, the 
expression level of this allele was reduced and rendered cells GCV- 
resistant (Extended Data Fig. 6e). Another escapee, E5, was the result 
of a regional deletion that included the Cdk1-TK locus, which led to a 
more than 18.5-Mb hemizygous region in the wild-type chromosome 
(Extended Data Fig. 6d, f). Diploidy (copy number of 2) in the ten genes 
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Fig. 1 | The concept, the definition, realization and properties of the 


safe-cell system. a, The suicide gene is placed into a cell-division essential 
locus (CDL), resulting in a bicistronic mRNA that is translated into two 
proteins; a cell division essential factor and a drug-inducible suicide factor. 
b, Visual representation of the SCL defined by one non-safe-cell batch 

out of many batches. c, The link between the prototype drug-inducible 
suicide system (HSV-TK) and the prototype CDL (CDK1). 3’ UTR, 3’ 
untranslated region. d, Representative growth of teratomas formed by 
mouse Cdk1-TK/Cdk1 cells, when the recipient mice were treated with 
PBS or GCV. e, Representative growth of teratomas formed by human 
CDK1-TK/CDK1 ES cells, when the recipient mice were treated with PBS 
or GCV. f, Representative growth of teratomas in mice formed by mouse 
Cdk1-TK/Cdk1-TK ES cells. g, Representative growth of teratomas in mice 
formed by human CDK1-TK/CDK1-TK ES cells. d-g, Experiments were 
repeated multiple times (see Extended Data Fig. 4) with similar results. 


and the lack of TK in the remaining six resistant clones suggested that 
these were formed by diploid loss of heterozygosity (LOH), which 
was probably caused by mitotic recombination or chromosomal non- 
disjunction, leading to homozygosity of the wild-type Cdk1 allele ina 
diploid form. These data indicate that dLOH is the dominant mech- 
anism by which the Cdk1-TK allele is lost in heterozygous ES cells, 
consistent with a study of mouse Aprt*’~ heterozygous cells, in which 
dLOH accounted for 78% of the loss of gene function events””. 

To mitigate the generation of escapees by dLOH, we established both 
mouse and human ES cell lines that were homozygous for the Cdk1-TK 
and CDKI1-TK alleles, respectively (Extended Data Figs. la—c, 2a-h, 
3a—e). As expected, we were unable to identify any escapees (Extended 
Data Fig. 6b). Homozygous ES-cell-derived teratomas behaved sim- 
ilarly to teratomas derived from heterozygous ES cells; a brief GCV 
treatment was sufficient to render the teratomas dormant (Fig. 1f, g and 
Extended Data Fig. 4c, f). As in human heterozygous ES-cell-derived 
lines, cyst formation also occurred in human homozygous ES-cell- 
derived teratomas (Extended Data Fig. 4f). 

To test the limits of our system, we generated homozygous safe-cell 
mouse mammary tumour cells and transplanted the cells into wild- 
type isogeneic C57BL/6N recipient mice. We found that the size of 
tumours was reduced following transplantation and their growth was 
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formation during cell expansion. QC, quality control. c, The function 
between therapeutic cell number and SCL was determined by Monte Carlo 
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homozygous; n = 10’ for CDL-SU/null heterozygous; n =13.5 x 10°- 
90.5 x 10° for (two-CDL)-SU homozygous. The bars on certain data 
points represent the 95% confidence intervals of the SCL estimates. 


restrained after GCV administration (Extended Data Fig. 5c). The 
observed growth rebound following GCV withdrawal was not surpris- 
ing as slow-dividing and quiescent tumour-prone cells survive GCV 
administration and can start proliferating in the absence of the drug. 
Nevertheless, even in this non-clinical, extreme situation in which a 
tumour cell line is used for cell transplantation, the homozygous safe- 
cell system is capable of controlling tumour growth. 

Because we were unable to identify any escapees that appeared 
in homozygous safe-cell ES cells, we used Monte Carlo simulation to 
estimate the odds of escapees in this scenario. The model considers 
three types of mutations that could potentially influence the function 
of the CDL-SU link (Extended Data Fig. 7a). Type-1 mutations (su1) 
render the suicide system non-functional while keeping the linked 
CDL operational. Type-2 mutations (su2) eliminate both the suicide 
system and CDL functionality through epigenetic or genetic changes 
to the entire locus, including a hemizygous LOH-dependent mech- 
anism. Type-3 mutations (su3) remove a functional CDL-SU allele 
by dLOH. 

To estimate the probabilities of type-1, -2 and -3 mutations (P, 
P, and P3, respectively) per cell generation (Extended Data Fig. 7a), 
we designed our own experiment (Supplementary Table 4) and also 
used published data?!-**, We used the values P;} = P; = 10-° and 
P3=2 x 10~* per cell per division, all of which are intentional overes- 
timates. Consequently, our calculated SCLs represent underestimates, 
being equal to or lower than the actual SCL. 

In silico, we subsequently generated a sufficient number of cell 
batches that were expanded from a single cell with an intact suicide 
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Fig. 3 | The safe-cell system eliminates only proliferating cells. 

a, Schematic. b-d, Human CDK-TK/CDK1-TK ES cells are differentiated 
into neural epithelial progenitors (b) and subsequently into neurons without 
(c) or with (d) GCV treatment. b, Neural epithelial progenitors consist 

of a mixed culture of proliferating neural epithelial progenitors and non- 
proliferating differentiated neurons that were treated with GCV. Scale bars, 
100 um. This experiment was repeated three times with similar results. 


system. During each doubling, the model permits allele transitions 
(Extended Data Fig. 7b) that determine the transition graph (Fig. 2a), 
reflecting the genotype change that could occur during cell expansion. 
For the homozygous CDK1-TK/CDK1-TK simulation, we initiated the 
batch production from an SU/SU cell with two intact suicide-system 
alleles, whereas for the heterozygous CDK1-TK/CDK1 WT simulation, 
the initiating cell was SU/su1, because the sul allele is functionally 
equivalent to the CDK1" allele. For the compound heterozygous 
CDK1-TK/CDK1"™" simulation, the initiating cell was SU/su2, as the 
su2 allele is the same as a CDK1”"" allele (Fig. 2a). 

A batch of cells is considered to be a safe-cell batch if it does not 
contain any escapees (Fig. 2b). On the basis of the frequency of getting 
a non-safe-cell batch of cells determined by the Monte Carlo simula- 
tion, we calculated the SCL as the function of the number of cells that 
is needed for a therapeutic cell batch. Figure 2c shows these functions 
for the different initiating cell genotypes described above. 

The number of cells that is required for cell therapy is disease-specific 
and is estimated to range between 10° (for example, the eye*””°) and 
10'° (for example, the heart”’) cells. The genotype scenarios presented 
in Fig. 2c show that a single TK insertion (CDK1-TK/CDK1™? or 
CDK1-TK/CDK1"") gives a low SCL. By contrast, a homozygous TK 
insertion into CDK1 significantly increases the SCL and brings the 
safety level into a clinically relevant range. However, for diseases that 
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require a larger number of therapeutic cells (10° to 10’ cells), the SCL 
provided even by the homozygous CDL-SU is insufficient (SCL < 10). 
For this disease category, we propose the use of a homozygous 
modification of two different CDLs. In this scenario, our Monte Carlo 
simulation showed a strong increase in SCL (SCL > 10° for all clinically 
relevant batch volumes, Fig. 2c). 

We observed that the logarithm of SCL, as a function of the loga- 
rithm of the cell number, is very close to a linear function when the 
SCL is above 10 and the cell number is in the clinically relevant range. 
Therefore, for an estimation of SCL, we applied linear regression to 
these segments (Fig. 2c). Using these approximates, the calculation of 
SCLs becomes very simple, while retaining the desired underestimates 
(Fig. 2c): for one CDL, this equates to SCL = 10°/cn, and for two CDLs 
this equates to SCL = 10!°/cn, where ‘cn’ is the number of cells needed 
for a therapeutic cell batch. 

During the production of therapeutic cells, some are lost during 
differentiation or expansion. Therefore, the efficiency of cell production 
should be accurately estimated and the cell numbers that are needed to 
generate a therapeutic batch should be corrected accordingly. 

In future cell therapies, if allogeneic cells are desired or HLA haplo- 
banks”* of pluripotent cell lines are available, generation of off-the-shelf 
cell batches would be advantageous. This would require the production 
of large pools of cells that, following quality control, would be aliquoted 
into therapeutic batches. We calculated the effect of aliquoting on SCL 
using both mathematical and Monte Carlo modelling approaches 
(equation (1), Supplementary Information and Extended Data Fig. 8a). 
Notably, aliquoting resulted in an approximately fivefold decrease in 
SCL using both approaches. 

Quality control should be performed on every cell pool to ensure 
that the originating cell was a safe-cell and consequently, the SCL 
calculation is correct. To this end, we grew several batches from a single, 
homozygous safe-cell ES cell. At the early phase of expansion, we 
verified that both CDK1-TK alleles were expressed and intact using 
flow cytometry, allele-specific PCR (Extended Data Fig. 8b-e) and 
sequencing of the TK coding region. 

Both mouse and human ES cells with homozygous modifications of 
CDK1 have a normal ES cell morphology, self-renewing capacity and 
ability to differentiate (Extended Data Fig. 9). Additionally, using in vitro 
neural differentiation, we demonstrate the selective killing of dividing 
cells by the safe-cell system. Following a brief GCV treatment, all 
mitotically active cells were eliminated whereas non-dividing cells were 
spared (Fig. 3). This ability could represent a valuable safety measure 
before transplantation of cells into a patient. 
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Fig. 4 | An in vivo proof of principle study shows the safe-cell system in 
action. a, A 3:1 mixture of human homozygous safe-cell ES-cell-derived 
RPE cells and homozygous safe-cell ES cells were subretinally injected 
into NSG mice and imaged using fundoscopy and optical coherence 
tomography throughout GCV or PBS treatment. b, Fundoscopy, optical 
coherence tomography and fluorescence imaging of the eyes of mice that 
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received GCV treatment (4 weeks). c, Fundoscopy, optical coherence 
tomography and fluorescence imaging of the eyes of a mouse that received 
PBS treatment (3 weeks) and developed an actively growing ES-cell- 
derived lesion (mCherry* cells). d.p.i., days post-injection. 

b, c, Experiments were repeated multiple times with similar results 
(Extended Data Fig. 10). 
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To simulate a clinical cell transplantation scenario gone awry, we 
injected a 3:1 mixture of human homozygous safe-cell retinal pigment 
epithelium (RPE) cells and human homozygous safe-cell ES cells (that 
were tagged with mCherry) into the subretinal space in the eyes of NSG 
mice. Among four injected eyes, we did not observe any ES cell deriva- 
tives when GCV was administered 24 h post-injection and for 28 days, 
as a preventative measure (Fig. 4a, b and Extended Data Fig. 10a, b). 
However, cell growth was detectable in six eyes that received PBS as an 
initial treatment (Fig. 4c and Extended Data Fig. 10d-f). Notably, even 
when GCV administration was delayed three weeks post-injection and 
cell growth was present, the homozygous safe-cell system efficiently 
arrested the ES-cell-derived component of the graft; only non-dividing 
cells remained (Fig. 4c and Extended Data Fig. 10e, f). This experi- 
ment illustrates the ability of the safe-cell system to selectively eliminate 
proliferating cells after cell transplantation. Neither the initial nor the 
delayed GCV treatment affected the RPE graft or the integrity of the 
surrounding retinal tissue (Fig. 4b, c and Extended Data Fig. 10). 

No therapy is without risk. Our safe-cell system (concept and 
genome-editing approach) provides a definition of risk and a quan- 
tification of the safety level as a function of the number of cells that is 
needed for any given cell therapy. We contend that the risks associated 
with the safe-cell system are sufficiently low to provide an indispensable 
component of prospective cell therapies. Our approach to assessing 
and quantifying the safety of cell-based therapies will be critical for 
informed decision-making by regulators, clinicians and patients while 
advancing modern medicine. 


Online content 

Any methods, additional references, Nature Research reporting summaries, source 
data, statements of data availability and associated accession codes are available at 
https://doi.org/10.1038/s41586-018-0733-7. 
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METHODS 


Generation of targeting vectors. Targeting vectors were generated by DNA 
synthesis, molecular cloning, recombineering and the NEBuilder HiFi DNA 
Assembly Cloning Kit (New England Biolabs). 

Generation of CRISPR-Cas9 vectors. pX330-U6-Chimeric_BB-CBh-hSpCas9 
was a gift from F. Zhang (Addgene plasmid 42230)”. Guide sequences for CRISPR- 
Cas9 were analysed using the online CRISPR design tool (http://crispr.mit.edu). 
Guide sequence for mouse Cdk1 targeting: TAAGAAGATGTAGCCCTC. Guide 
sequence for human CDK] targeting: CTATCTGTTGACATAACATA. 

Mouse ES cell culture. C57BL/6N C2 ES cells were grown at 37°C in 95% air, 5% 
CO, on mouse embryonic fibroblasts (MEFs) obtained from TgN(DR4) lJae/J mice 
(The Jackson Laboratory, 003208) at all times except for one passage on gelatinized 
tissue-culture plates before aggregation*”. Two types of medium were used. The first 
medium, FBS-DMEM ES cell medium, was used for gene targeting. This medium 
consisted of high-glucose DMEM supplemented with 15% FBS (previously shown 
to support germline chimaera generation), 2 mM GlutaMAX, 1 mM Na pyruvate, 
0.1 mM non-essential amino acids (NEAA), 50 U ml"! penicillin-streptomycin 
(all Thermo Fisher Scientific), 0.1 mM 2-mercaptoethanol (Sigma-Aldrich) 
and 1,000 U ml"! LIF prepared with LIF-producing plasmid*". 2). The second 
medium, KSR+2i medium was used for 2-4 passages before the generation of 
ES cell chimaeras'”. KSR+2i medium consisted of high-glucose DMEM medium 
supplemented with 15% knockout serum replacement (KSR) (Thermo Fisher 
Scientific), 1 mM Na pyruvate, 0.1 mM NEAA, 0.1 mM 2-mercaptoethanol, 2 mM 
GlutaMAX, 50 U ml"! penicillin-streptomycin, 500 U ml~! LIF, 5 mg ml" insulin 
(Thermo Fisher Scientific), 1 jtM of the mitogen-activated protein kinase inhibitor 
PD0325901 (StemGent) and 3 \.M of the glycogen synthase kinase-3 inhibitor 
CHIR99021 (StemGent). Cells were fed daily and passaged when they reached a 
confluency of 70-80%. Then, 0.05% trypsin-EDTA (Thermo Fisher Scientific) was 
used for the passaging of cells grown in FBS-DMEM and accutase (STEMCELL 
Technologies) was used for cells grown in KSR+2i medium. Cells were tested 
(negative) for mycoplasma contamination but were not authenticated. 

Mouse ES cell targeting. In brief, 50,000 mouse C57BL/6N C2 ES cells were trans- 
fected with 2 jg DNA (mouse target vector I or II, 1.5 jug; CRISPR vector, 0.5 j1g) 
using JetPrime for transfection (Polyplus). The cells were selected for G418 resist- 
ance (160 j.g ml~') starting 48 h after transfection. Resistant clones were picked 
independently and replicated in 96-well plates for freezing and genotyping using 
PCR. PCR-positive clones were expanded, frozen in multiple vials and genotyped 
by Southern blotting. 

Selection cassette excision in mouse ES cells. Correctly targeted ES cell clones 
were transfected with episomal-hyPBase (for mouse target vector I) or pCAGGs- 
NLS-Cre-IRES-puromycin (for mouse target vector II). Then, 2-3 days after trans- 
fection, cells were trypsinized and plated clonally (1,000-2,000 cells per 10-cm 
plate). mCherry* clones were picked and transferred to 96-well plates into inde- 
pendent wells and genotyped by PCR and Southern blotting to confirm the exci- 
sion event. The junctions of the removal region were PCR-amplified, sequenced 
and confirmed to be intact and without any frameshift mutations. GCV (Sigma- 
Aldrich) to test for TK activity was used at a final concentration of 1 1M. 
Human ES cell culture. Human CA1 and H1 ES cells were cultured on Geltrex 
(Thermo Fisher Scientific) using mTeSR1 medium (STEMCELL Technologies) 
containing 50 U ml"! penicillin-streptomycin (Thermo Fisher Scientific). Cells 
were passaged using TryplE Express (Thermo Fisher Scientific) and were subse- 
quently plated in mTeSR medium containing 10 1M ROCK inhibitor (Selleckchem) 
for 24h. Cells were tested (negative) for mycoplasma contamination but were not 
authenticated. 

Human ES cell targeting. For human targeting vectors I and II, six million human 
ES cells were electroporated using a Neon Transfection System (Thermo Fisher 
Scientific) with protocol 14 (pulse voltage, 1,200 mV; pulse width, 20 ms; pulse 
number, 2) with 24 jug DNA (target vector, 18 jug; CRISPR vector, 6 jg). After 
transfection, cells were plated on four 10-cm plates. G418 selection at 30 pg ml! 
or puromycin selection at 0.75 j1g ml~! was initiated 48 h after transfection. 
Independent colonies were picked to 96-well plates and each plate was duplicated 
for further growth and genotyping with PCR. PCR-positive clones were expanded, 
frozen in multiple vials and genotyped with Southern blotting. For human target 
vector III targeting, 10 million human ES cells were electroporated using Neon 
protocol 14 with 40 jig DNA (human target vector III, 30 jug; CRISPR vector, 10 jg) 
and plated in four 10-cm plates. After 3-4 days of transfection, cells that were 
double-positive for mCherry and eGFP were sorted into one well of a 96-well 
plate. After recovery from fluorescence-activated cell sorting (FACS), cells were 
dissociated and plated clonally (1,000-2,000 cells per 10-cm plate). Next, clones 
were picked independently, replicated and transferred to 96-well plates for freezing 
and genotyping with PCR. PCR-positive clones were expanded, frozen in multiple 
vials and genotyped by Southern blotting. 

Selection cassette excision in human ES cells. In brief, one million correctly tar- 
geted ES cell clones were electroporated with 2 |1g episomal-hyPBase-IRES-puro 
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(for human target vector I) or 2 jug episomal-Cre-IRES-puro (for human target 
vector IT) using Neon protocol 14. Once the cells were confluent in six-well plates, 
mCherry* cells were sorted into one well of a 96-well plate by FACS. After recovery, 
cells were dissociated and plated clonally (1,000-2,000 cells per 10-cm plate). 
Clones were picked and transferred to 96-well plates into independent wells and 
genotyped by PCR and Southern blotting to confirm the excision event. The junc- 
tions of the removal region were PCR-amplified, sequenced and confirmed to be 
intact and without frameshift mutations. 

PCR genotyping. For all PCR reactions, 2x Taq PCR master mix (Biomart) was 
used. Genomic DNA from human cell pellets was extracted using the DNeasy 
Blood & Tissue Kit (Qiagen). The primer pairs and conditions used for each reac- 
tion are listed in Supplementary Table 2. 

Southern blotting. In brief, 10 j1g of genomic DNA was extracted from PCR- 
positive clones, digested with Scal-HF overnight, resolved by 0.6-0.7% gel elec- 
trophoresis, and transferred to Hybond N+ (GE Healthcare). The following 
probes were labelled with*P and used to hybridize with the membrane (around 
25 ng probe per ml hybridization solution). Human CDK1 genomic probe: PCR- 
amplified with primers hCDK1-Probe6-F and hCDK1-Probe6-R. Mouse Cdk1 
genomic probe: PCR-amplified with primers 647302FWD and 647302REV. 
mCherry probe: the entire length of mCherry. eGFP probe: the entire length of 
eGFP. TK-mCherry probe: cut from hCDK1-PB-neo-TK-mCherry with Bsu361 
and SgrAI: 1,092 bp, gel-purified. 

Mice. The CD-1 (ICR) (Charles River) outbred albino mouse stock was used as 
embryo donors for aggregation with ES cells and as pseudopregnant recipients. 
Six-to-ten-week-old C57BL/6NCrl mice (Charles River) were used as the host 
for teratoma assays with mouse C2 ES cells. Six-to-ten-week-old C57BL/6NCr1 
or B6N-Tyr_c N4/Crl#493 (Charles River) mice were used as the host for mam- 
mary fat pad transplantation of mammary epithelial cells. Six-to-ten-week-old 
NSG/J#5557 mice (Jackson Laboratories) were used as the host for teratoma assays 
with human H1 or CA1 ES cells. FVB/N-Tg(MMTV-PyVT)634Mul/J mice were 
a gift from members of W. Muller’s laboratory”, and the backcross to B6J back- 
ground was done by members of A. Pawson’s laboratory. Animals were main- 
tained on a 12-h light/dark cycle and provided with food and water ad libitum in 
individually ventilated units (Techniplast) in the specific-pathogen free facility 
at The Centre for Phenogenomics (TCP). All procedures involving animals were 
performed in compliance with the Animals for Research Act of Ontario and the 
Guidelines of the Canadian Council on Animal Care. Animal protocols performed 
in this study were approved by the Toronto Centre for Phenogenomics Animal Care 
Committee, chaired by A. Jurisicova and L. Phaneuf. The number of animals used 
in different experiments was determined in accordance with similar studies in the 
field; owing to the nature of most experiments, blinding was impossible, because 
the results are visible at the time of analysis. Animals were allocated randomly 
when possible. 

Generation of chimaeras and mouse lines. Morula aggregations were performed 
as previously described*’. Chimaeras were identified at birth by the presence of 
black eyes and later by coat pigmentation. Male chimaeras with more than 50% ES 
cell contribution to coat colour were bred with CD-1 females to identify germline 
transmitters. The transmitter was then bred with C57BL/6NCrl females and pups 
were confirmed by genotyping to obtain Cdk1-TK/Cdk1 mice. Cdk1-TK/Cdk1 
MMTV-PyMT males were generated by breeding MMTV-PyMT (B6) males and 
Cdk-TK/Cdk1 females. Cdk1-TK/Cdk1 MMTV-PyMT and Cdk-TK/Cdk1-TK 
MMTV-PyMT female mice were generated by breeding Cdk1-TK/Cdk1 MMTV- 
PyMT males and Cdk1-TK/Cdk1 females. 

Teratoma assay. Matrigel Matrix High Concentration (Corning) was diluted 1:3 
with cold DMEM medium on ice. Then, 1-5 million mouse ES cells or 5-10 million 
human ES cells were suspended in 100 11 of Matrigel-DMEM and injected subcu- 
taneously into one or both dorsal flanks of C57BL/6NCrl mice (for mouse C2 ES 
cells) and NSG/J#5557 mice (for human H1 and CA1 ES cells). Teratomas formed 
2-4 weeks after injection. Teratoma size was measured using callipers and volume 
was calculated using the formula V= (L x W x H)1/6. GCV or PBS treatment 
was performed through daily intraperitoneal injections (50 mg kg~') with varying 
treatment durations. At the end of treatment, mice were euthanized and tumours 
were dissected and fixed in 4% paraformaldehyde for histological analysis. 
Breast cancer transplantation assay. Cdk1-TK/Cdk1 MMTV-PyMT and Cdk1- 
TK/Cdk1-TK MMTV-PyMT female mice developed mammary gland tumours 
between three and six months of age. Mammary epithelial tumorigenic cells were 
isolated from developed tumours by digestion in 10x collagenase-hyaluroni- 
dase (STEMCELL Technologies), and dilution to 1x with medium consisting 
of DMEM/F12 (Thermo Fisher Scientific), 10% FBS and 50 U ml"! penicillin— 
streptomycin for 1 h in 37 °C. The digested cells were washed and pelleted with 
DMEM/F12 and 10% FBS four times, and plated in CnT-PRIME epithelium culture 
medium (CELLnTEC Advanced Cell Systems) on plates coated with 0.1% gelatin 
(Sigma-Aldrich). Without passaging, primary mammary epithelial cells were dis- 
sociated and resuspended in PBS at 10,000 cells per jl, and 50 11 (500,000 cells) 
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was transplanted into each mouse by intraductal injection after making a small 
abdominal skin incision as previously described*”. Tumour measurement and PBS 
or GCV treatment were the same as described in the teratoma assay. 
Differentiation of human ES cells into RPE cells. RPE differentiation was 
performed as previously described? with minor changes. Human ES cells were 
plated on Geltrex-coated six-well plates and cultured in feeder-free conditions 
with mTeSR medium until confluency was reached and the colonies lost their tight 
borders (7-10 days). Next, the medium was replaced with differentiation medium 
(basal media with 13% KSR) and changed every 2-3 days. The basal medium 
consisted of KO-DMEM supplemented with 50 U ml’ penicillin-streptomycin, 
1 mM Na pyruvate, 0.1 mM NEAA, 2 mM GlutaMAX and 0.1 mM 2-mercap- 
toethanol. Initial pigmentation was observed approximately three weeks after the 
switch to differentiation medium. Clusters of RPE cells were manually picked and 
transferred to a Geltrex-coated 24-well plate (three clusters per well) when they 
were large enough (around 1 mm in diameter) for enrichment and the medium 
was changed to RPE medium, which consisted of basal medium with 5% FBS, 7% 
KSR and 10 ng ml! bFGF (Peprotech). 

Differentiation of human ES cells into definitive endoderm. Definitive endo- 
derm differentiation was performed using the STEMdiff Trilineage Differentiation 
Kit (StemCell Technologies) and characterized by immunostaining for SOX17 and 
FOXA2 (Supplementary Table 3). 

Differentiation of human ES cells into pharyngeal pouch endoderm. 
Differentiation into pharyngeal pouch endoderm was performed as previously 
described* with the only modification being that the induction from ES cells to 
definitive endoderm is one day shorter than reported. 

Differentiation of human ES cells into mesenchymal stem cells and subsequent 
adipogenic, osteogenic and chondrogenic differentiation. ES cells were cultured 
in mTeSR medium for two days. Next, cells were induced into early mesoderm 
progenitor cells with STEMdiff Mesenchymal Induction Medium (STEMCELL 
Technologies) for four days and then maintained in MesenCult-ACF Medium 
(STEMCELL Technologies). Cells were continually passaged into six-well plates 
precoated with MesenCult-ACF Attachment Substrate (STEMCELL Technologies) 
to derive early mesenchymal progenitor cells. At day 21, the mesenchymal stem 
cells (MSCs) showed a fibroblast-like morphology and the culture medium was 
changed every three days. For adipogenic differentiation, MSCs at a density of 
20,000 cells per well were plated with MesenCult-ACF Attachment Substrate 
and cultured with MesenCult-ACF Medium for two days. Adipogenesis was 
induced using the StemPro Adipogenesis Differentiation Kit (Thermo Fisher 
Scientific). After 21 days, lipid droplets were visualized using Oil Red O (Sigma). 
For osteogenic differentiation, ES-derived MSCs at a density of 50,000 cells per 
well were plated with MesenCult-ACF Attachment Substrate and cultured with 
MesenCult-ACF Medium for two days. Osteogenesis was induced using the 
StemPro Osteogenesis Differentiation Kit (Thermo Fisher Scientific). After 21 days, 
calcium deposition was visualized using Alizarin Red (Sigma-Aldrich). For induc- 
tion of chondrogenic differentiation, ES-derived MSCs were centrifuged in 15-ml 
conical tubes at 500g for 5 min to create cell pellets with 5,000,000 cells per pellet. 
Chondrogenesis was induced using the StemPro Chondrogenesis Differentiation 
Kit (Thermo Fisher Scientific). After 21 days, cartilage was visualized using Alcian 
Blue (Sigma-Aldrich). Differentiation medium was changed every three days. 
Differentiation of human ES cells into beating cardiomyocytes. Cardiomyocyte 
differentiation was performed using the STEMdiff Cardiomyocyte Differentiation 
Kit (STEMCELL Technologies). 

Differentiation of human ES cells into neuronal progenitors and neurons. 
To differentiate human ES cells into neuronal progenitors, human ES cells were 
plated at 50-100,000 cells per cm? in 1:1 DMEM/F12: Neurobasal (Thermo 
Fisher Scientific), 0.5 N2 supplement (made in-house, 1.92 mg ml! putrescine, 
2.376 1g ml! progesterone, 3.6 1M selenium, 10 mg ml”! apo-transferrin, 0.75% 
BSA, 20 g ml"? insulin), 0.5 x B27 supplement with vitamin A (Thermo Fisher 
Scientific), 2 mM GlutaMAX, 0.1 mM 6-mercaptoethanol, 50 U ml™! penicillin— 
streptomycin, 10 uM $B431542 (Selleckchem), 100 nM LDN193189 (Selleckchem) 
(10 1M ROCK inhibitor was added overnight only and was subsequently removed). 
Cells were maintained in this condition for eight days and the medium was changed 
every other day. Next, neuronal progenitors were dissociated with accutase and 
plated at a density of 5 x 104 cells per cm? on laminin (Sigma-Aldrich, 1 jul for 
1cm/2, diluted in 250 jl PBS without Ca and Mg) in fast neuron differentiation 
medium, 1:1 DMEM-F12:Neurobasal, 1 x B27 supplement with vitamin A, 5 11M 
DAPT (Selleckchem), 2 mM GlutaMAX, 0.1 mM 6-mercaptoethanol, 50 U ml“! 
penicillin-streptomycin. Medium was changed every three days. Then, 10 1M 
GCV was added six days after neuron differentiation and kept for six days. After 
five days of GCV treatment, 10 |1M BrdU (Sigma-Aldrich) was added and the 
cultures were fixed after six days of GCV treatment and were immunostained for 
BrdU and (3-tubulin III (Supplementary Table 3). 

Flow cytometry analysis and FACS. Flow cytometry and FACS experiments were 
both performed and analysed by the Lunenfeld-Tanenbaum Research Institute 


flow cytometry facility. FACS was performed using the ASTRIOS EQ cell sorter. 
Flow cytometry was performed using the GALLIOS flow cytometer and evaluated 
using Kaluza Analysis Software (Beckman Coulter). Samples were gated for live 
single cells using forward scatter, side scatter and DAPI staining. Wild-type and 
single-colour samples of the same cell type as the experimental samples were used 
as negative controls and compensation calculations. Human ES cell samples were 
single-cell sorted using StemFlex Medium (Thermo Fisher Scientific) and 10 tM 
ROCK inhibitor. 

Immunostaining. Cells fixed in 4% PFA were blocked and permeabilized with 
5% goat serum, 1 M glycine and 1% Triton X-100 (all Sigma-Aldrich) in PBS 
without Ca and Mg or animal-free blocker (Vector Laboratories) with 1% Triton 
X-100 in milliQ water. All of the primary antibody information can be found in 
Supplementary Table 3. Staining was visualized using a Zeiss LSM780 confocal 
microscope. 

Histology analysis. Paraffin embedding, paraffin block sectioning, and haema- 
toxylin and eosin staining were performed by the Pathology Core of The Centre 
for Phenogenomics. 

qPCR. Gene expression analyses were completed as follows: RNA extraction by 
GenElute Mammalian Total RNA Miniprep Kit (Sigma), reverse transcription 
with QuantiTect Reverse Transcription Kit (Qiagen), qPCR with SensiFAST SYBR 
No-Rox Kit (Bioline) on Bio-Rad CFX Real-Time Systems (Bio-Rad) and analysis 
with Bio-Rad CFX Manager 3.1. All information regarding primers and probes 
for the TaqMan qPCR analyses can be found in Supplementary Table 2. For copy 
number analysis, the reactions were performed using TaqMan Genotyping Master 
Mix (Thermo Fisher Scientific) and CFX Real-Time Systems, and were analysed 
by CopyCaller Software v.2.1 (Thermo Fisher Scientific). 

Luria and Delbruck assay. The Luria and Delbruck assay was performed as pre- 
viously described**. CDK1-TK homozygous 3C cells were single-cell plated in a 
96-well plate using FACS. Subsequently, 21 single-cell-derived cultures were grown 
to an average of five million cells per culture, and the numbers of single-positive 
cells in each culture were analysed using flow cytometry. The mutation rate was 
calculated using the previously described equation® available at https://www. 
wolframalpha.com/. 

In vivo transplantation of RPEs. CDK1-TK homozygous 3C ES cells were trans- 
fected with the PB-CAGGs-mCherry-pA plasmid and sorted for highly express- 
ing cells. Then, 40,000 3C-derived RPEs only or 30,000 3C-RPEs and 10,000 
mCherry-tagged 3C ES cells were injected subretinally with 0.5%/0.5% (wt/vol) 
hydrogel blend of hyaluronan and methylcellulose (HAMC) in HBSS. PBS or GCV 
(50 mg kg~!) treatments were started the day after cell injection or as stated in 
the Figures, and were given every other day through intraperitoneal injections. 
Monitoring by fundoscopy and optical coherence tomography was performed on 
the day after transplantation and then once a week. 

SCL calculations. To established the rate of mutation in the CDK1-TK allele, we 
used our targeted human H1] CDK1-TK-mCherry/CDK1-TK-eGFP dichromatic 
cell line, as most mutations in either CDK1-TK allele result in monochromatic 
cells. We grew 21 parallel cultures from a single dichromatic cell to an average of 
5 x 10° cells per culture (>22 consecutive doublings) and determined the num- 
ber of monochromatic cells in the culture using flow cytometry. Next, we applied 
Luria—Delbruck fluctuation analysis**** to calculate the sum of the P; + P3 + P3 
probabilities in the two CDK1-TK alleles. We found that the mutation rate of losing 
mCherry was 9.05 x 10~° per cell per division while the mutation rate of losing 
eGFP was similar at 7.68 x 10~° per cell per division (Extended Data Fig. 8a, b). 
To further validate the probabilities of these various mutations, we also analysed 
published studies that focused on these events. In mouse ES cells, the mutation rate 
(P; + P2 + P3) of changing from a dichromatic to a monochromatic phenotype 
in the Rosa26 locus (mouse chromosome 6) was 1 x 107° per cell, per division". 
Similarly, another study calculated the mutation rate of gene function loss in the 
Gdf9 locus (mouse chromosome 11) to be 2.3 x 10-5 events per cell per division”. 
Furthermore, the probability of the type 3 mutation, P3 alone, has been calculated 
as 1 x 10-5, 7.2 x 10°-® and 8.5 x 10-° in three different studies?‘ by performing 
high-G418 selection in mouse ES cells. The P; + P, mutation rate has also been 
estimated in the human HPRT locus on the X chromosome to be 1.7-6 x 1077 by 
Luria—Delbruck fluctuation analysis*’, and 5 x 10~° through mutation frequency 
analysis in population datasets*”. Next, we performed Monte Carlo simulations 
to establish the SCL of cell batches derived from different SU genotypes. On the 
basis of both published data and our own, we used the values P; = P) = 10°° and 
P3 = 2 x 1075 per cell per division; all of which are intentional overestimates. 
An ES cell population was considered to be a mix of mutant and non-mutant 
cells with reference to the CDL-SU locus (or loci). All possible mutations were 
categorized into three different types: type 1, when only the SU part of the locus 
becomes non-functional (su] allele); type 2, when both the CDL and SU become 
non-functional (su2 allele); type 3, when any of the above occurs as a result of LOH 
(Fig. 2a, b). Back mutations, such as sul to SU, su2 to SU or su2 to sul, were not 
considered, because of their extremely low probabilities (Fig. 2c). Back mutations, 
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such as SU/su1 to SU/SU, SU/su2 to SU/SU and su1/su2 to sul/sul, were consid- 
ered as a part of the more frequent LOH process. P), P2 and P3 were designated 
the probabilities of each mutation type, respectively. We distinguished between 
two types of P3: P3mr (probability LOH occurred through mitotic recombination), 
where both daughter cells survive; and P3cnq (probability LOH occurred through 
chromosomal non-disjunction), where one of the daughter cells with the single 
remaining copy of the chromosome is likely to die“. On the basis of these proba- 
bilities, matrices of transitions between all possible genotypes within one or two 
CDL systems were constructed (Fig. 2c, https://github.com/mashutova/failsafe). 
With each division cycle (d), all cells within the population except cells with the 
su2/su2 genotype, were allowed to divide. Genotypes with sul/su2 and sul/sul 
were considered escapees and the simulation initiated from one non-mutant cell. 
n(gi,d) was the number of cells of genotype g; at doubling d, and P(g,g2) was the 
probability of transition from genotype g; to go. In each doubling, the number of 
cells changing genotype from g; to g> was determined through random sampling 
from a binomial distribution with parameters 2n(g1,d — 1) and P(g1,g2). We used 
a Poisson approximation of a binomial distribution to work with ultra-low P(g1,g2) 
values. For each division, the number of cells of each genotype was assessed, and 
the simulation proceeded until the first escapee was detected. For each starting gen- 
otype, we performed more than 10 million simulations and obtained a distribution 
of the number of doublings (d) from the detection of the first escapee. On the basis 
of these data, we generated a function of SCL (overall number of trials divided by 
number of trials with escapees) over cell population size (2%) (Fig. 2d). Because all 
graphs contain almost linear regions, we used linear models to extrapolate them to 
high SCL values. To obtain linear regression lines, we used only simulated points 
from the linear-like part of the graph (R? > 0.999) with 95% confidence intervals 
less than 1,000. To obtain a conservative boundary for the SCL, we used only the 
lowest confidence interval values to build linear regressions. To analyse the out- 
come from the aliquoting of the pool of safe-cell cells possibly containing escapees, 
using probability modelling we developed the following formula to calculate the 
drop of SCL (for details see equation (1)) 


SCs. 1 
SCL, 


k 


A Sa a (1) 


To reduce the complexity of the model, we considered only one escapee event 
in the pool, as the possibility of two independent escapees occurring in a pool 
is low in the quasi-linear phase of SCL. Nevertheless, we tested the effect of this 
omission on the drop of SCL due to aliquoting using Monte Carlo simulations. We 
performed 10 million independent trials for a doubling of 20 and a doubling of 27, 
and obtained a distribution of the number of escapees for each of them. Through 
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randomly sampling a number of escapees from each trial to the A aliquots, we 
calculated the number of ‘bad’ aliquots containing one or more escapees (Ap). To 
calculate a new SCL of the population after aliquoting, where SCL, is the SCL of 
the original population, we used the formula A x SCLp/mean(Ay). The drop in 
SCL was measured in silico and was compared with the value that we obtained 
from the equation obtained from the probability model. 

Code availability. The code used in this manuscript is publicly available at https:// 
github.com/mashutova/failsafe. 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 


Data availability 
The data that support the findings of this study are available from the correspond- 
ing author upon reasonable request. 
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Extended Data Fig. 1 | Generation, genotyping and characterization 
of mouse C57BL/6N C2 Cdk1-TK/Cdk1 and Cdk1-TK/Cdk1-TK ES 
cells. a, Summary of the targeting steps used to generate mouse C2 Cdk1- 
TK/Cdk1 and Cdk1-TK/Cdk1-TK ES cells. b, Southern blot genotyping 
with internal TK-mCherry probe. c, Southern blot genotyping with 
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mouse Cdk1 genomic probe. d, In vitro GCV dose-response killing curve 
of mouse C2 CDK1-TK/CDK1 ES cells. Data are mean + s.d., n = 3. 

e, Teratoma formation efficiency of mouse C2 Cdk1-TK-PB/Cdk1, 
Cdk1-TK/Cdk1 and Cdk1-TK/Cdk1-TK ES cells. 
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Extended Data Fig. 2 | Generation, genotyping and characterization of CDK1 expression in H1 wild-type cells, and cells expressing the CDK1- 


human H1 CDK1-TK/CDK1 and CDK1-TK/CDK1-TK ES cells. TK/CDKI1 clone Excl6 and the CDK1-TK/CDK1-TK clone Exc16-3C. 

a, Generation of human Hl CDK1-TK/CDK1 and CDK1-TK/CDK1-TK Data are mean + s.d., n = 3. No significant difference was found between 
ES cells. b, Southern blot genotyping of CDK1-TK/CDK]1 clone Excl6, groups. f, TaqMan qPCR copy-number analysis of TKs of all clones with 
which was used in teratoma assays (Fig. 1c and Extended Data Fig. 7) the correct genotype. g, The efficiency of teratoma formation in NSG mice 
and CDK1-TK/CDK1-TK clone Exc16-3C, which was used in the using human H1 ES cells. h, Dose-response analysis of wild-type, CDK1- 
differentiation assays in Fig. 3. c, PCR genotyping of all the correct clones. TK/CDK1 and CDK1-TK/CDK1-TK human H1 ES cells. Cells were 

d, Flow cytometry analysis of the CDK1-TK/CDK1 clone Excl6 and the treated with different GCV concentrations, dissociated and counted after 
CDK1-TK/CDK1-TK clone Exc16-3C. e, SybrGreen qPCR of human seven days. Data are mean + s.d., n = 3. 
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Extended Data Fig. 3 | Generation, genotyping and characterization of the backbone contains a Scal restriction enzyme site, which is consistent 


human CA1 CDK1-TK/CDK1 and CDK1-TK/CDK1-TK ES cells. with the sizes of the band in Southern blots. c, Haematoxylin and eosin 
a, Generation of human CA1 CDK1-TK/CDK1 and CDK1-TK/CDK1-TK staining of a CDK1-TK/CDK1-TK CAI ES-cell-derived teratoma. d, The 
ES cells. b, Southern blot genotyping of human CAl CDK1-TK/CDK1 efficiency of teratoma formation in NSG mice using human CA1 ES cells. 
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Extended Data Fig. 4 | See next page for caption. 
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Extended Data Fig. 4 | Growth graphs of mouse and human ES- with two lines represent mice that had cells injected into both flanks. The 
cell-derived teratomas. a, Growth of teratomas derived from mouse graphs with one line represent mice that had cells injected into one flank. 
heterozygous safe-cell ES cells (C2 Cdk1-TK/Cdk1) b, Adult mouse The GCV treatment regime varies among mice because each teratoma 
with stabilized subcutaneous tissue (safe-cell ES-cell-derived dormant behaves differently; we started GCV when the teratoma size started to 
teratoma), 2.5 months after GCV treatment. c, Growth of teratomas increase. f, Growth of teratomas derived from human homozygous safe- 
derived from mouse homozygous safe-cell ES cells (C2 Cdk1-TK/Cdk1- cell ES cells (H1 CDK1-TK/CDK1-TK), GCV treatment was every other 
TK). d, Growth of teratomas derived from human heterozygous safe-cell day. Images of cystic teratomas are shown next to the corresponding 

ES cells (H1 CDK1-TK/CDK1, clone Exc16); daily GCV treatment. growth line, cysts were drained after dissection to show the difference in 

e, Examples of teratomas from human heterozygous safe-cell ES cells tumour weight due to the fluid present in the tissue. Each graph represents 


showing cyst formation, images of cystic teratomas at dissection areshown _ one mouse. a, ¢, d, f, All replicates of these experiments are shown. 
next to the corresponding growth line; daily GCV treatment. The graphs 
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Extended Data Fig. 6 | See next page for caption. 
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Extended Data Fig. 6 | In vitro experiments with mouse C2 
Cdk1-TK/Cdk1 and Cdk1-TK/Cdk1-TK ES cells and subsequent 
characterization of escapees. a, Experimental design: mCherryt cells 
were selected by sorting to ensure that the starting cell population did not 
contain escapees. These cells were plated on six-well plates (200 cells per 
well, in a total of 36 wells) and allowed to grow to 14 cell doublings (this 
was estimated by counting cells in sample wells). The 36 cultures were then 
resuspended to a single-cell suspension and each was plated in a 15-cm 
plate (4 x 10° cells). One day after plating, selection with GCV was started 
and maintained until escapee colonies appeared. b, Escapee numbers 
obtained in 36 independent cultures growing from Cdk1-TK/Cdk1 and 
Cdk1-TK/Cdk1-TK ES cells. c, PCR to determine the presence of TK. 
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d, TaqMan copy number qPCR analysis of Akap7, Sim1 and Cdk1 junction 
of exon 8 and 3’ UTR, Neurod, Cdk1, TK transgene and Abca on mouse 
chromosome 10. Data are the copy number calculated by CopyCaller 
Software v.2.1 and the error bars indicate the range from the minimum 

to the maximum number. n = 3. The same colour in the background of c 
and d indicates that they are from the same independent culture. n.d., not 
determined. e, qPCR to compare TK expression level in Cdk1-TK/Cdk1 
escapee clone 2A and C2 wild-type ES cells. Data are mean + s.e.m., 

n = 3. f, Summary of the copy number analysis of mouse Cdk1-TK/Cdk1 
escapees. a, b, Experiments were repeated twice on a smaller scale but with 
similar results. d, Experiments were repeated twice with similar results. 
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Extended Data Fig. 7 | Schematics of the possible mutation types affecting the CDL-SU allele. a, Three types of mutations that could affect the CDL- 


SU allele. b, Safe-cell allele transition considered in the modelling and Monte Carlo simulations. 
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Extended Data Fig. 8 | Quality control of batches generated from single 
human H1 CDK1-TK/CDK1-TK ES cells. a, The drop of SCL due to 
aliquoting from a pool of cells relative to non-aliquoted batches of the 
same size. b, Schematics of the alleles in the CDK1-TK/CDK1-TK human 


ES cells used in the quality control. c, Workflow schematic of performing 
quality control (QC) on several ES cell batches. d, An example of the flow 
cytometry for the quality control of nine clonally derived batches. e, An 
example of PCR for the quality control of nine clonally derived batches. 
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Extended Data Fig. 9 | Mouse and human safe-cell homozygous CDK1- 


TK/CDK1-TK cells demonstrate pluripotency. All experiments were 
performed using the same clone of mouse C2 or human H1 (Exc16-3C) 
CDK1-TK/CDK1-TK cells. a, Bright-field photograph showing mouse 
homozygous Cdk 1-TK/Cdk1-TK ES cell morphology. b, Bright-field 
photograph showing human homozygous CDK1-TK/CDK1-TK ES 
cell morphology. c, Haematoxylin and eosin staining of a mouse Cdk1- 
TK/Cdk1-TK ES-cell-derived teratoma. d, Haematoxylin and eosin 
staining of a human CDK1-TK/CDK1-TK ES-cell-derived teratoma. 

e, An adult Cdk1-TK/Cdk1-TK mouse. f, OCT4 and NANOG staining 
of human CDK1-TK/CDK1-TK ES cells. g, qPCR characterization of 
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CDK1-TK/CDK1-TK ES cell differentiation into RPE cells. Data are 

mean + s.d., n = 3. h, Bright-field picture of human CDK1-TK/CDK1-TK 
ES-cell-derived RPE cells. i, ZO1 staining of human CDK1-TK/CDK1-TK 
ES-cell-derived RPE cells. j, Human CDK1-TK/CDK1-TK ES-cell-derived 
adipocytes, chondrocytes and osteocytes. k, SOX17 and FOXA2 staining 
of human CDK1-TK/CDK1-TK ES-cell-derived definitive endoderm. 

1, OCT4 (also known as POUSF1), NANOG, SOX17, FOXA2 and HOXA3 
qPCR characterization of ES cell (day 0) differentiation into definitive 
endoderm (day 4) and pharyngeal pouch endoderm (day 8). Data are 
mean + s.d., n = 3. a-d, f-i, Experiments were repeated three times with 
similar results. j-1, Experiments were repeated twice with similar results. 
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Extended Data Fig. 10 | See next page for caption. 
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Extended Data Fig. 10 | Representative images of eyes transplanted with 
both safe-cell RPE and safe-cell ES cells, and only with safe-cell RPE 
cells. a, b, Fundoscopy, optical coherence tomography and fluorescence 
imaging of eyes transplanted with safe-cell RPE and safe-cell ES cells 
(four-week GCV treatment). The absence of a mCherry signal indicates 
that ES cell growth has not occurred. b, Bottom, images of the green 
fluorescence channel are included to illustrate that the observed signal in 
the red fluorescence channel is actually autofluorescence. This experiment 
was repeated four times in four mice with similar results. c, Histological 
analysis of the eye presented in d. d, Fundoscopy, optical coherence 
tomography and fluorescence imaging of eyes transplanted with safe-cell 
RPE and safe-cell ES cells (PBS treatment). This experiment was repeated 
twice in two mice with similar results. e, f, Fundoscopy, optical coherence 
tomography and fluorescence imaging of eyes transplanted with safe-cell 


RPE and safe-cell ES cells, mCherry signal is detectable and indicates ES 
cell growth. GCV treatment began three weeks post-injection following 
an initial PBS treatment. This experiment was repeated four times in three 
mice with similar results. g, Fundoscopy, optical coherence tomography 
and fluorescence imaging of eyes receiving only safe-cell RPE cells (four- 
week GCV treatment). This demonstrates that GCV treatment did not 
affect the RPE cells. This experiment was repeated five times in three 
mice with similar results. h, Fundoscopy, optical coherence tomography 
and fluorescence imaging of eyes receiving only safe-cell RPE cells (four- 
week PBS treatment). This experiment was repeated six times in three 
mice with similar results. i, Fundoscopy, optical coherence tomography 
and fluorescence imaging of eyes receiving only HAMC (four-week GCV 
treatment). This experiment was repeated twice in one mouse with similar 
results. 
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Methicillin-resistant Staphylococcus aureus alters 
cell wall glycosylation to evade immunity 


David Gerlach!!%, Yinglan Guo*’, Cristina De Castro*, Sun-Hwa Kim‘, Katja Schlatterer!’, Fei-Fei Xu°, Claney Pereira®, 


Peter H. Seeberger®, Sara Ali’, Jeroen Codée’, Wanchat Sirisarn®, Berit Schulte*°, Christiane Wolz*”, Jesper Larsen”, 
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Antonio Molinaro", Bok Luel Lee®, Guoqing Xia®, Thilo Stehle*!?4* & Andreas Peschel>?!4* 


Methicillin-resistant Staphylococcus aureus (MRSA) is a frequent 
cause of difficult-to-treat, often fatal infections in humans!*. Most 
humans have antibodies against S. aureus, but these are highly 
variable and often not protective in immunocompromised patients’. 
Previous vaccine development programs have not been successful’. 
A large percentage of human antibodies against S. aureus target wall 
teichoic acid (WTA), a ribitol-phosphate (RboP) surface polymer 
modified with N-acetylglucosamine (GIlcNAc)*°. It is currently 
unknown whether the immune evasion capacities of MRSA are due 
to variation of dominant surface epitopes such as those associated 
with WTA. Here we show that a considerable proportion of the 
prominent healthcare-associated and livestock-associated MRSA 
clones CC5 and CC398, respectively, contain prophages that encode 
an alternative WTA glycosyltransferase. This enzyme, TarP, transfers 
GlcNAc to a different hydroxyl group of the WTA RboP than the 
standard enzyme TarS’, with important consequences for immune 
recognition. TarP-glycosylated WTA elicits 7.5-40-fold lower levels 
of immunoglobulin G in mice than TarS-modified WTA. Consistent 
with this, human sera contained only low levels of antibodies against 
TarP-modified WTA. Notably, mice immunized with TarS-modified 
WTA were not protected against infection with tarP-expressing 
MRSA, indicating that TarP is crucial for the capacity of S. aureus 
to evade host defences. High-resolution structural analyses of TarP 
bound to WTA components and uridine diphosphate GlcNAc (UDP- 
GlcNAc) explain the mechanism of altered RboP glycosylation and 
form a template for targeted inhibition of TarP. Our study reveals 
an immune evasion strategy of S. aureus based on averting the 
immunogenicity of its dominant glycoantigen WTA. These results 
will help with the identification of invariant S. aureus vaccine 
antigens and may enable the development of TarP inhibitors as a new 
strategy for rendering MRSA susceptible to human host defences. 
Novel prevention and treatment strategies against major antibiotic- 
resistant pathogens such as MRSA are urgently needed but are not 
within reach because some of the most critical virulence strategies 
of these pathogens are not understood®. The pathogenic potential of 
prominent healthcare-associated (HA)-MRSA and recently emerged 
livestock-associated (LA)-MRSA strains is thought to rely on par- 
ticularly effective immune evasion strategies, whereas community- 
associated (CA)-MRSA strains often produce more aggressive toxins!”. 
Most humans have high overall levels of antibodies against S. aureus asa 
consequence of preceding infections, but antibody titres differ strongly 
for specific antigens and are often not protective in immunocompro- 
mised patients, for reasons that are not clear*. A large percentage 
of human antibodies against S. aureus is directed against WTA>”!°, 


which is largely invariant. However, some S. aureus lineages produce 
altered WTA, which modulates, for instance, phage susceptibility”! I. 
To investigate whether some prevalent S. aureus lineages use additional 
WTAaA-targeted strategies to increase their fitness and pathogenicity, 
we screened S. aureus genomes for potential additional paralogues 
of WTA biosynthesis genes. We found three S. aureus prophages that 


encoded a protein, TarP, that has 27% identity to the WTA-8-GlcNAc 
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Fig. 1 | The phage-encoded TarP can replace the housekeeping WTA 
G-GlcNAc transferase TarS. a, TarP is encoded next to different integrase 
types (int gene) in prophages ptarP-Sa3int (with immune evasion 

cluster scn, chp, sak, sep), found in HA-MRSA, and ptarP-Salint and 
ytarP-Sa9int, identified in LA-MRSA. TarP variants in ptarP-Salint 

and ptarP-Sa9int differed from TarP in ptarP-Sa3int in one amino acid 
each (I8M and D296N, respectively). Both residues are distant from the 
catalytic centre. b, Complementation of S. aureus RN4420 AtarM/S with 
either tarS or tarP restores susceptibility to infection by WTA GlcNAc- 
binding siphophages, as indicated by plaque formation on bacterial lawns. 
Data shown are representative of three independent experiments. c, tarP 
expression reduces siphophage ®11-mediated transfer of SaPIbov in N315. 
Values indicate the ratio of transduction units (TrU) to plaque-forming 
units (PFU) given as mean + s.d. of three independent experiments. 
Statistical significances when compared to wild type were calculated by 
one-way ANOVA with Dunnett’s post-test (two-sided) and significant 

P values (P< 0.05) are indicated. NO (none obtained) indicates no 
obtained transductants. 
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Fig. 2 | TarP protects N315 from podophage infection by alternative 
glycosylation of WTA at RboP C3. a, Expression of tarP renders N315 
resistant to podophages. Representative data from three independent 
experiments are shown. b, 'H NMR spectra reveal different ribitol 
hydroxyl glycosylation of N315 WTA by TarS (C4) or TarP (C3). The 
RboP units with attached GlcNAc are depicted above the corresponding 
proton resonances. Representative data from three experiments are shown. 
In-depth description of the structural motifs identified in the spectra 

is given in the Supplementary Information. c, Crystal structure of TarP 
homotrimer (pink, orange, grey) bound to UDP-GlcNAc (yellow) and two 
Mn?" ions (lime green). The nucleotide-binding domain (NBD), acceptor- 


transferase TarS’ (Fig. 1a). tarP was found exclusively in isolates of the 
prominent HA-MRSA CC5", on a prophage that also encoded the 
scn, chp and sak immune evasion genes’, and on two other prophages 
in the emerging LA-MRSAs CC398"4 and CCS". All tarP-harbouring 
genomes also contained tarS. 

When tarP from CC5 HA-MRSA strain N315 was expressed in a 
WTA glycosylation-deficient mutant of laboratory strain RN4220’, 
it restored WTA glycosylation (Extended Data Fig. 1a) and suscep- 
tibility to siphophages, which need RboP WTA GlcNAc as a binding 
motif'® (Fig. 1b). The presence of 3-GlcNAc on WTA is essential for full 
B-lactam resistance in MRSA strains’. When tarP was expressed ina 
WTA glycosylation-deficient mutant of CA-MRSA strain MW2 (CC1), 
it restored full oxacillin resistance (Extended Data Fig. 1b), confirming 
that tarP can replace tarS in several key interactions. 

The expression of TarP led to susceptibility to siphophages, albeit to a 
lower extent than TarS (Extended Data Fig. 1c), although TarP did not 
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binding domain (ABD), and C-terminal trimerization domain (CTD) of 
the pink monomer are labelled. d, Views into the trimer interface (boxed 
in c). Left, polar interactions. Hydrogen bonds and salt bridges are shown 
as black dashed lines. The Mn?* is 2.1 A from each Asp316 carboxylate. 
Right, hydrophobic interactions, with the mutated residue Ie322 
highlighted in red. e, Size-exclusion chromatography elution profiles. 
Based on calibration of the column, the TarP wild-type and 1322E mutant 
proteins have estimated molecular weights of 138 kDa (n = 8) and 42 kDa 
(n= 3), respectively, in agreement with the calculated molecular weights of 
120 kDa for a TarP trimer and 40 kDa for monomeric TarP. 


incorporate less GlcNAc into WTA than TarS (Extended Data Fig. 1d, 
Supplementary Table 3). Similarly, the siphophage-mediated horizontal 
transfer of an S. aureus pathogenicity island was reduced about 
tenfold in S. aureus N315 expressing tarP, compared to the same strain 
expressing only tarS (Fig. 1c), suggesting that TarP and TarS glycosylate 
WTA differently. Notably, N315 was resistant to podophages, but inac- 
tivation of tarP (but not of tarS) rendered it susceptible to podophages 
(Fig. 2a). We analysed the overall effect of tarP on podophage suscep- 
tibility patterns in 90 clinical CC5 and CC398 isolates and found that 
none of the tarP-containing strains, but all of the tarP-lacking strains, 
were susceptible to podophages (Extended Data Table 1). Thus, TarP 
causes podophage resistance and TarP-mediated modification of WTA 
is distinct from that mediated by TarS. Nuclear magnetic resonance 
(NMR) analyses revealed that both TarP and TarS add GlcNAc to WTA 
in the 8-configuration. However, the attachment site in RboP differs: 
TarS glycosylates the C4 position!” whereas TarP attaches GlcNAc 
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to C3 (Fig. 2b, Extended Data Fig. 2, Supplementary Table 2). This 
difference may be crucial for impairing phage infection. Moreover, 
NMR analysis revealed that TarP is dominant over TarS because in 
N315, which bears both genes, GlcNAc was almost exclusively attached 
to RboP C3 (Fig. 2b). 

We solved the TarP structure at high resolution to elucidate how TarP 
generates a different glycosylation product from TarS. Like TarS'’, TarP 
forms stable homotrimers, but it uses a different trimerization strategy 
because it lacks the C-terminal trimerization domain found in TarS 
(Fig. 2c, Extended Data Fig. 3). Instead, hydrophobic and polar inter- 
actions of a small helical C-terminal domain generate the TarP trimer 
(Fig. 2d, e). WTA polymers comprising three or six RboP repeating 
units (3RboP or 6RboP-(CH2)s6NHz, respectively) were synthesized and 
used for soaking TarP crystals (Supplementary Information Fig. 2, 3), 
yielding the first protein structure visualizing the binding of a WTA- 
based polymer (Fig. 3, Extended Data Fig. 4). In the ternary complex 
TarP-UDP-GlcNAc-3RbofP, the distance between the C3-hydroxyl of 
the third unit of 3RboP (RboP3) and the anomeric Cl of GlcNAc is 
4.2 A. Furthermore, at 3.1 A, Asp181 is well within hydrogen bond- 
ing distance of the C3-hydroxyl of RboP3. The observed distances 
and geometry nicely explain the unusual glycosylation of WTA at the 
C3-hydroxyl. We propose that TarP uses a direct Sy2-like glycosyltrans- 
ferase reaction, as discussed for other inverting GT-A fold enzymes!?”°. 
In this mechanism, Asp181 would act as the catalytic base, deprotonat- 
ing the C3-hydroxyl on RboP3 and enabling a nucleophilic attack on 
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Fig. 3 | Interactions of TarP with UDP- 
GlcNAc and p-ribitol-5-phosphate trimer 
(3RboP), and comparison of polyRboP 
binding sites of TarP and TarS. a, 3RboP 
binding site in the TarP-3RboP complex, 

with key amino acids shown (cyan). Asp181 

is highlighted in red. The ribitol of 3RboP is 
coloured green and p-ribitol-5-phosphate 
units 1, 2 and 3 (RboP1, RboP2, and RboP3) 
are labelled. Hydrogen bonds and salt bridges 
are shown as black dashed lines. b, Ternary 
complex of TarP with UDP-GlcNAc and 3RboP. 
UDP-GIcNAc, Mg** and 3RboP are shown as 
full-atom models coloured yellow, magenta, 
and green, respectively. c, View into the active 
site of TarP. Cl of UDP-GIcNAc and Asp181 
are highlighted with red labels. The arrow 
indicates how the C3-hydroxyl in RboP3 
could nucleophilically attack GlcNAc Cl. 

d, Comparison of the polyRboP-binding site 
of TarP with the corresponding region in TarS. 
Residues of TarP and 3RboP are coloured as in 
a. TarS residues are coloured violet and the two 
sulfates are labelled S1 and S2. Only residues 
of TarP are labelled, for clarity. Key TarP and 
TarS residues lining the polyRboP-binding site 
are shown at the bottom, with three identical 
(red) and one conserved amino acids (blue). 

e, Superposition of UDP-GlcNAc-bound TarS 
with the ternary TarP-UDP-GlcNAc-3RboP 
complex. UDP-GlcNAc and 3RboP bound 

to TarP are coloured as in b, whereas UDP- 
GlcNAc bound to TarS is coloured in cyan. 
Only the TarS residues are shown (coloured as 
in d), for clarity. The arrows indicate the Cl 
positions of UDP-GlcNAc in TarP and TarS. 


the GlcNAc Cl, thus yielding a B-O-GlcNAcylated polyRboP (Fig. 3c). 
Mutagenesis of Asp181 to alanine rendered TarP inactive, supporting 
this putative mechanism (Extended Data Table 2). 

The ternary structure of TarP-UDP-GlcNAc-3RboP allows us to 
predict how polyRboP binds to the homologous TarS enzyme. Three 
residues that are critical for binding and catalysis (including Asp181) 
are identical in TarP and TarS, while five other residues differ!® 
(Fig. 3d). Lys255 and Arg262, for instance, which interact electrostat- 
ically with a WTA phosphate group in TarP, are replaced with Glu248 
and Ser255, respectively, in TarS, which may lead to reduced affinity for 
WTA and might explain why TarP is dominant over TarS in vivo. On 
the basis of the location of UDP-GlcNAc, the identical Tyr149, Asp178 
and Arg252 side chains, the conserved aromatic side chain of Phe256, 
and a site that contains a bound sulfate ion from the crystallization 
solution (S1) and probably binds phosphate in TarS (Fig. 3e), the polyR- 
boP chain would be shifted to the upper right, and the relative position 
of RboP units in the binding site would be altered in TarS. Such an 
altered binding mode would move the C4-hydroxyl of the target RboP 
towards C1 of GlcNAc, allowing TarS to glycosylate at the C4 position. 

S. aureus WTA is a dominant antigen for adaptive immune 
responses™”. The observation that the position of GlcNAc on RboP 
had a profound impact on binding by podophage receptors raises the 
question of whether human antibodies also discriminate between 
the two isomeric polymers and whether MRSA clones use TarP to 
subvert immune recognition. We analysed several human antibody 
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Fig. 4 | TarP attenuates immunogenicity of WTA. a, TarP expression 
reduces deposition of IgG from human serum on N315 cells. The protein 
A gene spa was deleted in all strains. Top, human IgG isolated from three 
individual healthy donors (A, B, and C; n= 4); bottom, left, IgG from 
human serum enriched for RN4220 WTA binding (n = 4); middle and 
right, pooled human IgG from different suppliers (Abcam, n = 4; Athens 
R&T, n=6). Results were normalized against wild type and shown as 
means with s.d. of m experiments. P values for comparison with wild type 
were calculated by one-way ANOVA with Dunnett’s post-test (two-sided), 
and P< 0.05 was considered significant. Significant P values are displayed. 
b, TarP reduces neutrophil phagocytosis of N315 strains lacking protein 
A, opsonized with indicated concentrations of IgG enriched for WTA 
binding. Values are depicted as mean fluorescence intensity (MFI). Means 
of two dependent replicates of a representative experiment are shown. 


preparations for their capacity to opsonize a panel of N315 strains 
with or without tarP and/or tarS. The mutant lacking any WTA 
glycosylation bound the lowest amount of IgG compared to WTA 
glycosylation-positive strains (Fig. 4a), demonstrating that glyco- 
sylated WTA is a prominent S. aureus antigen in humans. Exclusive 
expression of tarS led to strongly increased IgG binding compared 
to the glycosylation-deficient mutant, indicating that B-GlcNAc on 
RboP C4 is an important epitope for human anti-S. aureus anti- 
bodies. By contrast, expression of tarP in the presence or absence 
of tarS led to only slightly increased IgG binding compared to the 
glycosylation-deficient mutant. The capacity of TarP to impair the 
deposition of IgG on S. aureus differed with individual serum donors 
and reached average levels in pooled serum preparations (Fig. 4a). 
When tarP was deleted in three further CC5 isolates, they showed 
similarly increased capacities to bind human serum antibodies com- 
pared to the wild-type strains (Extended Data Fig. le). Additionally, 
tarP deletion led to a substantially increased capacity of human 
neutrophils to phagocytose opsonized S. aureus (Fig. 4b, Extended 
Data Fig. 1g). Thus, only a small percentage of S. aureus-specific 
antibodies can bind WTA with 3-GlcNAc on RboP C3, and tarP- 
expressing S. aureus are less likely to be detected and eliminated by 
human phagocytes. 


708 | NATURE | VOL 563 | 29 NOVEMBER 2018 


The other two representative experiments can be found in Extended Data 
Fig. 1g. c, TarP abrogates IgG response of mice towards WTA. For each 
experiment, WTA from N315 AtarP or AtarS was isolated independently. 
At least three mice per group were vaccinated and analysed for specific 
IgG at indicated time points after vaccination. Results are depicted as 
mean absorbance with s.d. Individual mice are indicated by colour. 
Increase in IgG levels was assessed by one-way ANOVA with Tukey’s 
post-test (two-sided). Significant differences (P < 0.05) are indicated 

with corresponding P values. d, Vaccination with WTA does not protect 
mice against tarP-expressing N315, as shown for bacterial loads in kidney 
upon intravenous infection. No significant differences between groups of 
either five vaccinated mice or four mice for the alum control group (means 
indicated), calculated by one-way ANOVA, were observed. 


We purified N315 WTA that had been glycosylated by TarS or 
TarP and used it to immunize mice. Antibodies binding to regular 
(TarS-modified) WTA increased continuously over three weeks after 
vaccination (Fig. 4c). By contrast, no or only very low amounts of IgG 
directed against TarP-glycosylated WTA emerged, indicating that WTA 
modified at RboP C3 is much less immunogenic than WTA modified at 
RboP C4. This experiment was repeated three times with three different 
WTA preparations and yielded broadly similar data. 

Vaccination with S. aureus WTA bearing GlcNAc at RboP C4 
protects mice against infection by CA-MRSA strains USA300 (CC8) or 
USA400 (CC1), which both lack tarP>”!. Remarkably, vaccination with 
regular (TarS-modified) or TarP-modified WTA did not lead to any 
notable protection against subsequent infection with tarP-expressing 
N315 compared to mock vaccination, despite the robust antibody 
response against regular WTA (Fig. 4d). Together, our results demon- 
strate that tarP protects S. aureus against adaptive host defences 
by allowing bacteria to evade recognition by preexisting anti- 
S. aureus antibodies and by exploiting the poor immunogenicity of 
TarP-modified WTA. 

It is possible that TarP-modified WTA mimics a currently unknown 
autoantigen and is therefore hardly immunogenic. On the other hand, 
regular S. aureus WTA can be ingested by antigen-presenting cells 
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and presented to T cells, in a largely unexplored way, thereby evoking 
specific immunoglobulins and immunological memory*”™. It is possi- 
ble that TarP-modified WTA is refractory to this process. Thus, TarS- 
and TarP-modified WTA could be helpful for decoding glycopolymer 
presentation pathways and for defining the most promising WTA 
epitopes for the development of protective vaccines against S. aureus. 

Protection against S. aureus infections is urgently needed, in particu- 
lar for hospitalized and immunocompromised patients™*. Antibodies 
can in principle protect against S. aureus, but their titres and specif- 
icities vary largely among humans and they are often not protective 
in immunocompromised patients’, probably in particular against 
S. aureus clones that mask dominant epitopes, for instance using TarP. 
Unfortunately, all previous human vaccination attempts with protein 
or glycopolymer antigens have failed, for reasons that are unclear’. 
Our study identifies a new strategy used by pandemic MRSA clones to 
subvert antibody-mediated immunity, which should be considered in 
future vaccination approaches. S. aureus WTA with GlcNAc at RboP C3 
has been reported as a type-336 antigen, but was not further explored”. 
We found that tarP is present in type-336 S. aureus (Extended Data 
Fig. 1f). However, TarP-modified WTA is a very poor antigen and 
vaccines directed against GlcNAc at WTA RboP C3 or C4 may fail 
against many of the pandemic MRSA clones. The structural character- 
ization of TarP will instruct the development of specific TarP inhibitors 
that could become important in combination with anti- WTA vaccines 
or antibiotic therapies. We found tarP-encoding prophages in 70-80% 
of south-west German HA-MRSA CC5 and 40% of Danish LA-MRSA 
CC398 isolates (Extended Data Table 1), pointing to a crucial role of 
tarP in the fitness of these lineages and raising concerns of further 
dissemination by horizontal gene transfer. TarP is a new and probably 
crucial component of the S. aureus virulence factor arsenal?°?’, high- 
lighting the important roles of adaptive immunity and its evasion in 
S. aureus infections. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized. The investigators were not blinded to allocation during 
experiments and outcome assessment. 

Bacterial strains and growth conditions. S. aureus strains N315, RN4220, and 
MW2 (wild type and mutants) were used for this study. Collections of CC5 isolates 
of the Rhine-Hesse pulsed-field gel electrophoresis type”* and of the LA-MRSA 
lineage CC398 from the Danish Statens Serum Institut”**? were analysed for the 
presence of tarP and for podophage susceptibility. Additionally, 48 spa-type t002 
(ST5) and 16 spa-type t003 (ST225) isolates were obtained from the MRSA collec- 
tion of the University Hospital Tubingen and analysed for tarP presence by PCR. 
S. aureus strains were cultivated in tryptic soy broth (TSB) or basic medium (BM; 
1% tryptone, 0.5% yeast extract, 0.5% NaCl, 0.1% glucose, 0.1% KzHPO, w/v). 
MICs of oxacillin were determined by microbroth dilution according to established 
guidelines*!. 

Experiments with phages. tarP-encoding phages were identified in genome 
sequences using the webtool Phaster* in representative strains listed with 
GenBank accession: ®tarP-Sa3int with immune evasion cluster (IEC) in 
CC5 (strain N315, BA000018.3), ®tarP-Salint, found in LA-MRSA of CC5 
(strain ISU935, CP017090), and ®tarP-Sa9int found in CC398 (strain E154, 
CP013218). 

Phage susceptibility was determined using a soft-agar overlay method!®. In brief, 

10 ul phage lysate of 10*-10° PFU was dropped onto soft agar containing 100 
ul bacterial suspension (OD¢00 of 0.1). Plates were incubated at 37°C overnight. 
The efficiency of plating was determined as described**. Transfer of SaPIs was 
determined according to previously described methods!!. In brief, SaPI particle 
lysates were generated from S. aureus strain JP1794, which encodes a SaPI with a 
resistance marker for tetracycline™. PFU of SaPI lysate was determined on RN4220. 
200 1l bacterial culture (OD¢00 of 0.5) was mixed with 100 i1l of SaPI particle lysate 
(SaPIbov1 (©11), 10° PFU/ml), incubated at 37°C for 15 min. Appropriate dilu- 
tions were plated on TSB plates containing 3 ,1g/ml of tetracycline, and CFU were 
checked after overnight incubation. 
WTA isolation and structure analysis. WTA from S. aureus was isolated and 
purified according to previously described methods". In brief, WTA was released 
from purified peptidoglycan by treatment with 5% trichloroacetic acid and dialy- 
sed extensively against water using a Spectra/Por3 dialysis membrane (MWCO of 
3.5 kDa; VWR International GmbH). Obtained soluble WTA was quantified by 
determining the content of phosphate and GlcNAc*®, For PAGE analysis of WTA, 
samples (400 nmol of phosphate per lane) were applied to a 26% polyacrylamide 
(Rotiphorese Gel 40 (19:1)) resolving gel and separated at 25 mA for 16 h?”. The 
gel was equilibrated in a solution of 40% ethanol and 5% acidic acid at room tem- 
perature for 1 h and the WTA ladders were visualized by incubation with alcian 
blue (0.005%) for several hours. 

NMR spectroscopy experiments were carried out on a Bruker DRX-600 
spectrometer equipped with a cryo-probe, at 288 K (WT-WTA, TarS-WTA, and 
TarP-WTA) or 298 K (double-mutant WTA lacking any glycosylation). Chemical 
shifts of spectra recorded in D2O were calculated in p.p.m. relative to internal 
acetone (2.225 and 31.45 p.p.m.). The spectral width was set to 10 p.p.m. and the 
frequency carrier placed at the residual HOD peak, suppressed by pre-saturation. 
Two-dimensional spectra (TOCSY, gHSQC, gHMBC, and HSQC-TOCSY) were 
measured using standard Bruker software. For all experiments, 512 FIDs of 2,048 
complex data points were collected, 32 scans per FID were acquired for homo- 
nuclear spectra, and 20 or 100 ms of mixing time was used for TOCSY spectra. 
Heteronuclear 'H-2C spectra were measured in the 'H-detected mode, gHSQC 
spectrum was acquired with 40 scans per FID, the GARP sequence was used for 
8C decoupling during acquisition; gHMBC scans doubled those of gHSQC spec- 
trum. As for HSQC-TOCSY, the multiplicity editing during selection step version 
was used, scans tripled those of the HSQC spectrum and two experiments were 
acquired by setting the mixing time to 20 or 80 ms. During processing, each data 
matrix was zero-filled in both dimensions to give a matrix of 4K x 2K points 
and was resolution-enhanced in both dimensions by a cosine-bell function before 
Fourier transformation; data processing and analysis were performed with the 
Bruker Topspin 3 program. 

Molecular biology. All primers used for PCR, cloning, and mutagenesis are 
listed in Supplementary Table 1. tarP (UniProt AOAOH3JNBO, NCBI Gene ID 
1260584) was amplified using genomic DNA of S. aureus N315 and inserted in 
Eschericha coli/S. aureus shuttle vector pRB474** at the BamHI and Sacl sites, to 
transform S. aureus, or into pQE80L at BamH1 and HindIU sites, to transform 
E. coli BL21(DE3). A thrombin cleavage site was inserted between the His-tag and 
mature protein in pQE80L. Single mutations of TarP were introduced by PCR- 
based site-directed mutagenesis*’. The obtained amplicons were confirmed by 
sequencing. For the construction of marker-less S. aureus deletion mutants of tarS 
or tarP, the pIMAY shuttle vector was used“°. The IgG-binding surface protein A 
gene (spa) was deleted using the pKORI shuttle vector*’. Protein A deletion had 


no impact on phage siphophage or podophage susceptibility, indicating that it did 
not alter WTA amount or structure. 

Protein expression, purification, and activity assay. E. coli BL21(DE3) were grown 
in LB medium at 30°C. Expression of tarP was induced with 1 mM IPTG at 22°C at 
an OD6go9 of 0.6. After 15 h, cells were harvested, washed with wash buffer (50 mM 
Tris-HCl, pH 8.0, 1 mM EDTA), and lysed by sonication with lysis buffer (70 mM 
NaH»PO,, pH 8.0, 1 M NaCl, 20% glycerol, 10 U/ml of benzonase nuclease). 
After centrifugation (15,000g), the supernatant was filtered with a 0.45 ,.m filter, 
loaded onto a His Trap FF column (GE Healthcare, 5 ml), and washed with buffer 
A (50 mM NaH>PO,, pH 8.0, 1 M NaCl, 20% glycerol) supplemented with 45 mM 
imidazole and buffer B (buffer A with 90 mM imidazole). Finally, the protein was 
eluted with buffer C (buffer A with 500 mM imidazole), and the fractions were 
pooled, and further purified by size-exclusion chromatography on a Superdex 
200 10/30 column equilibrated with buffer D (20 mM MOPS, pH 7.6, 400 mM 
LiCl, 10 mM MgCh, 5 mM 8-mercaptoethanol, 5% glycerol). The peak fractions 
were pooled and concentrated to 1.4 mg/ml for crystallization. For selenomethio- 
nyl-form TarP production, bacteria were grown in a selenomethionine-containing 
medium (Molecular Dimension) and auto-induction was carried out. The protein 
was purified as described above. The activity of wild-type and mutated TarP, as well 
as donor substrate specificity of TarP were determined with the ADP Quest Assay 
kit (DiscoverRx, Extended Data Tables 2, 3). The reaction volume was 20 jl with 
1mM UDP-GlcNAc, 1.5 mM purified WTA from RN4220 AtarM/S. The reaction 
was started with protein and incubated at room temperature for 1 h. Released UDP, 
coupled into a fluorescence signal, was detected in a 384-well black assay plate with 
530 nm excitation and 590 nm emission wavelengths using TECAN Infinite M200. 
Crystallization and data collection. Crystals were obtained by vapour diffusion 
at 20°C. 1 tl protein solution was mixed with 1 iil reservoir solution containing 
25% PEG 3350, 250 mM MgCh, and 0.1 M sodium citrate, pH 5.7. The selenome- 
thionyl-form protein was crystallized under the same conditions. For crystals of 
TarP with UDP-GlcNAc, 27 mM UDP-GIcNAc was introduced in the reservoir 
solution containing 250 mM MgCl, or 230 mM MnCl. Crystals of TarP with Mgt 
were used for soaking of synthetic 3RboP (60 mM), 6RboP-(CH2)sNH2 (41 mM), 
or UDP-GlcNAc (20 mM) combined with 3RboP (52 mM) for 5 min. For data 
collection the crystals were cryo-protected with 20% glycerol in reservoir solution 
and flash-frozen in liquid nitrogen. Diffraction data were collected at beamline 
X06DA of Swiss Light Source in Villigen, Switzerland, or at beamline BL14.1 at 
BESSY-II, Helmholtz Zentrum Berlin. 

Phasing, model building, and refinement. For phase determination, two data sets 
from a selenomethionine-containing TarP crystal were collected at wavelengths of 
0.97941 A (peak) and 0.97952 A (inflection). The structure was solved by multi- 
wavelength anomalous dispersion (MAD) at 2.60 A resolution. All data were 
reduced using XDS/XSCALE software packages™. Initial phases were derived from 
the substructure of 26 selenium atom sites per asymmetric unit with the program 
suite SHELX C/D/E®. The heavy atom parameters were further refined and the 
initial phases were improved by SHARP/autoSHARP™. The initial model was 
generated with PHENIX* and the final model was achieved by cycles of iterative 
model modification using COOT“*, and restrained refinement with REEMAC. TLS 
was used in the later stages*”“*, The four binary and one ternary complex structures 
were solved by molecular replacement using PHASER” and the unliganded TarP 
structure was used as a search model. UDP-GlcNAc, 3RboP, Mg?", or Mn**+ were 
removed from the models to calculate the simulated annealing (mF, — DF.) omit 
maps using PHENIX. The anomalous difference map of Mn?* at 1.89259 A was 
generated by FFT within CCP4, from which two Mn?" in the active site and one 
Mn? at the trimer interface were identified. The coordinate and parameter files 
for 3RboP and 6RboP-(CH2)sNH) were calculated using the PRODRG server”’. 
The structure figures were generated by PYMOL” and the models were evaluated 
using MolProbity™. Statistics for the data collection, phasing, and refinement are 
reported in Extended Data Tables 4 and 5. 

Synthesis of ribitol phosphate oligomers. Synthesis of 3RboP. Target compound 
1, p-ribitol-5-phosphate trimer (3RboP), was prepared by the phosphoramidite 
method*** (Supplementary Fig. 2). In brief, the primary alcohol of commercially 
available compound 2 was converted into levulinoyl ester by using levulinic acid 
and N,N'-dicyclohexylcarbodiimide (DCC), and the allyl group of 3 was removed 
with tetrakis(triphenylphosphine)palladium to produce compound 4. The primary 
alcohol of 4 reacted with phosphine derivative 5 in the presence of diisopropylam- 
monium tetrazolide® to generate phosphoramidite 6. At the same time, compound 4 
was coupled with dibenzyl N,N-diisopropylphosphoramidite 7, which was cata- 
lysed by 1H-tetrazole, and the product was further oxidized by tert-butyl hydrop- 
eroxide, yielding protected p-ribitol-5-phosphate 8. Cleavage of the levulinoy ester 
of 8 with hydrazine hydrate resulted in benzyl protected p-ribitol-5-phosphate 9, 
which was further coupled with phosphoramidite 6 and oxidized with tert-butyl 
hydroperoxide to yield protected dimers of p-ribitol-5-phosphate 10. After removal 
of the levulinoyl group, the dimer 11 was coupled with phosphoramidite 6 using 
the same conditions as above to obtain a protected trimer of D-ribitol-5-phosphate 
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12. Subsequent removal of the levulinoyl group and hydrogenolysis of 13 to remove 
all benzyl groups yielded 3RboP 1. All chemicals and experimental procedures as 
well as characterization of products can be found in the Supplementary Methods. 
Synthesis of 6RboP-(CH2)sNH>2. Aminohexy] p-ribitol-5-phosphate hexamer 
(6RboP-(CH2)sNHz) was synthesized using a new method (Supplementary Fig. 3). 
All chemicals (Acros, Biosolve, Sigma-Aldrich and TCI) for the synthesis were used 
as received and all reactions were performed under a protective argon atmosphere 
at room temperature, unless otherwise stated. Procedures for phosphoramidite 
coupling, oxidation, detritylation, global deprotection, TLC analysis and charac- 
terization of these compounds can be found in Supplementary Methods. 
Human samples. Venous blood samples were obtained from male and female 
healthy volunteers (20-50 years) with protocols approved by the Institutional 
Review Board for Human Subjects at the University of Tubingen (014/2014BO2 
und 549/2018BO2). Informed written consent was obtained from all volunteers. 
Blood samples were used for purification of either serum IgGs or neutrophils as 
described below. 

IgG from human plasma. IgG was purified from plasma of human donors using 
the NAb Protein G Spin Kit (ThermoFisher), purity was checked by SDS PAGE, and 
protein concentration was determined using Bradford assay. Anti- WTA-IgG was 
prepared as described’. To analyse the IgG-binding capacity of S. aureus cells, expo- 
nentially growing bacterial cultures were adjusted to an ODgo9 of 0.5, diluted 1:10 
in PBS, and 100 11 of diluted bacteria was mixed with 100 11 of IgG diluted in PBS 
with 1% BSA. The concentration of IgG was 250 ng/ml for IgG enriched for WTA 
binding, 10 j1g/ml for IgG from pooled human serum (Athens R&T 16-16-090707, 
Abcam ab98981), or 5 j1g/ml for single-donor IgG preparations. A control without 
IgG was included in all experiments for all mutants. Samples were incubated at 
4°C for 1h, centrifuged, washed 2-3 times with PBS, and further incubated with 
100 jl FITC-labelled anti-human IgG (Thermo Scientific, 62-8411, 1:100 in PBS 
with 1% BSA, 62-8411) at 4°C for 1 h. Bacteria were centrifuged, washed 2-3 
times with PBS, and fixed with 2% paraformaldehyde (PFA). Surface-bound IgG 
was quantified by flow cytometry using a BD FACSCalibur. For all flow cytometry 
experiments a mutant panel lacking spa, the gene for the IgG-binding protein A, 
was used. The subsequent gating strategy is exemplified in Extended Data Fig. 5a. 
IgG-mediated phagocytosis. Stationary-phase S. aureus cells were washed once 
with PBS and labelled by incubation in PBS containing 10 |1M carboxyfluorescein 
succinimidyl ester (CFSE; OD¢o9 of 1.7) at 37°C for 1 h. The bacteria were washed 
three times and resuspended in PBS. CFU were determined by plating on TSB 
plates and bacteria were heat-inactivated at 70°C for 20 min. CFSE-labelled 
S. aureus (1 x 10’ cells/ml) in PBS with 0.5% BSA were opsonized with anti-WTA- 
IgG (0.15 or 0.3 ng/il) at 4°C for 40 min. Neutrophils from human donors, isolated 
via Ficoll-Histopaque density gradient centrifugation”®, were diluted to a con- 
centration of 2.5 x 10°/ml in neutrophil medium (10% HSA, 2 mM t-glutamine, 
2 mM sodium pyruvate, 10 mM HEPES). 200 1l neutrophil suspension was incu- 
bated with 25 11 opsonized bacteria (final MOI 0.5) in a 96-well plate at 37°C for 
30 min, centrifuged (350g, 10 min), washed once with 200 \1l PBS, and fixed with 
2% PFA at room temperature for 15 min. Cells were washed twice with PBS and 
analysed by flow cytometry, whereby surface-bound and ingested bacteria were 
measured without discrimination. An example of the neutrophil gating strategy 
can be found in Extended Data Fig. 5b. 

Mice. Six-week-old sex-matched wild-type C57BL/6J mice, purchased from 
ORIENT BIO (Charles River Breeding Laboratories in Korea), were kept in 
micro-isolator cages in a pathogen-free animal facility. The conducted experi- 
ments were performed according to guidelines and approval (PNU-2017-1503) 
by the Pusan National University-Institutional Animal Care and Use Committee 
(PNU-IACUC). Sample size was chosen to obtain significant outcomes (alpha 
error < 5%), based on results from previous experiments”’. Experiments were 
performed in a non-blinded, non-randomized fashion. 

Mouse vaccination and infection. 30 jg of purified WTA from S. aureus N315 
wild-type or isogenic AtarP, or AtarS mutants was dissolved in 15 jl PBS and 
mixed with the same volume of aluminium hydroxide gel adjuvant (Alhydrogelr 
1.3%, 6.5 mg/ml, Brennatag). The mixtures were incubated at 37 °C with agitation 
for 1 h and injected three times at one-week intervals via mouse footpads. Seven 
days after the third injection, blood was obtained from the retro-orbital sinus 
and centrifuged (9,000g) at 4°C for 10 min. The supernatants were aliquoted 
(50 jl) and stored at -80°C for ELISA quantification of WTA-binding IgG as 
described?’. Sera were diluted 1:100 and tested by ELISA on 96-well plates coated 
with 2.5 j1g/ml sonicated WTA preparations (WTA from N315, AtarS or AtarP, 
respectively). 

To prepare an inoculum for infection, N315 wild-type bacteria were grown in 
TBS at 37°C with agitation (180 r.p.m.) until they reached an ODgoo of 1.0. After 
centrifugation (3,500g) at 4°C for 10 min, bacteria adjusted to 5 x 10’ CFU in 50 pl 
PBS containing 0.01% BSA were intravenously injected (n =5 per group). Injected 
bacterial numbers were verified by plating serial dilutions of the inoculum onto 
TSA plates. To determine residual bacterial dissemination to kidneys, challenged 
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mice were euthanized, and organs were extracted aseptically and homogenized 
in 1 ml of saline using a Polytron homogenizer (PT3100). The homogenates were 
serially diluted and plated on TSA to determine CFU counts. CFU were calculated 
per 1 ml of kidney. 

Statistical analyses. Statistical analysis was performed by using GraphPad Prism 
(GraphPad Software, Inc.). Statistically significant differences were calculated by 
appropriate statistical methods as indicated. P values of < 0.05 were considered 
significant. 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 


Data availability 

All major data generated or analysed in this study are included in the article or 
its supplementary information files. The coordinates and structure factors were 
deposited in the Protein Data Bank under accession numbers 6H1J, 6H21, 6H2N, 
6H4F, 6H4M and 6HNQ. Source data for experiments with animals (Fig. 4c, d) 
are provided. Additionally, a gel image of Extended Data Fig. 1f is supplied as 
Supplementary Fig. 1. All other data relating to this study are available from the 
corresponding authors on reasonable request. 
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Extended Data Fig. 1 | See next page for caption. 
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Extended Data Fig. 1 | Characterization of TarP, deposition of human 
IgGs, and presence of tarP in the producer of antigen 336. a, Analysis 
of WTA by PAGE. WTA from RN4220 AtarM/S expressing either tarP 
or tarS was compared with non-glycosylated WTA. Data shown are 
representative of two experiments. b, MIC values of oxacillin against 
MW2 wild type, tarS mutant, and tarP-complemented tarS mutant. Data 
are medians of ten independent experiments. c, Efficiency of plating 
(EOP) of phage ®11 against tarS or tarP-expressing RN4420 AtarM/S. 
Values of tarP relative to tarS expression are given as mean +s.d. (n =3). 
Statistical significance was calculated by paired Student's t-test (two- 
sided) with significant P values (P< 0.05) indicated. d, The level of WTA 
glycosylation catalysed by TarP or TarS was determined by analysing the 


GlcNAc and phosphate content of WTA isolated from a N315 strain panel. 


Depicted is the ratio of GlcNAc and phosphate as mean with s.d. of three 
technical replicates. The values are in good agreement with NMR data 


(Supplementary Table 3). e, Relative deposition of IgG from intravenous 
immunoglobulins enriched for WTA binding on different CC5 wild- 
type and tarP mutant cells. Values are given as mean percentage + s.d. 
of four independent experiments. Statistical significance was calculated 
by paired Student's t-test (two-sided). P values < 0.05 were considered 
significant and are indicated. f, Presence of tarP and tarS in S. aureus 
ATCC55804, expressing antigen 336, described as 3-O-GlcNAc-WTA”. 
Shown is a representative of two independent replicates. g, TarP reduces 
neutrophil phagocytosis of N315 strains lacking protein A, opsonized 
with indicated concentrations of IgG enriched for WTA binding. Values 
are depicted as mean fluorescence intensity (MFI). Shown are two 
independent experiments with neutrophils from different donors. These 
data supplement data presented in Fig. 4b: upper panel, mean of three 
technical replicates of an independent experiment, lower panel, mean of 
two technical replicates. 
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Extended Data Fig. 2 | NMR analysis of WTA from N315 mutant 
panel. All depicted experiments were repeated twice. y-axes and x-axes 
show #C and !H chemical shifts, respectively. a~d, NMR spectra of non- 
glycosylated WTA (AtarSAtarP mutant). a, HSQC expansion of the 
region containing the ribitol and glycerol protons shifted by acylation; 

b, c, HSQC-TOCSY-20 and HSQC-TOCSY-80 spectra, respectively. 

d, HSQC area of the non-acylated ribitol and glycerol proton. e-h, NMR 
spectra of TarS-WTA (AtarP mutant). e, HSQC expansion of the region 
containing the ribitol and glycerol protons shifted by acylation. 

f, g, HSQC-TOCSY-20 and HSQC-TOCSY-80, respectively. h, HSQC area 
of the non-acylated ribitol and glycerol proton. i-o, NMR spectra of TarP- 
WTA (AtarS mutant). i, HSQC expansion of the region containing the 


4.8 go 7 r 7 7 . - + - + 1 . “r . 
4.8 4.6 4.4 4.2 4.0 3.8 3.6 ppm 

ribitol and glycerol protons shifted by acylation. j, k, HSQC-TOCSY-20 
and HSQC-TOCSY-80 spectra, respectively. 1, HSQC area of the non- 
acylated ribitol and glycerol protons. m, Expansion of 1 with HSQC 
(black/grey) overlapped with HSQC-TOCSY-20 (cyan). n, Overlap of 
HSQC-TOCSY-20 (cyan) and HSQC-TOCSY-80 (black). 0, HSQC (black) 
and HMBC (grey) detailing the GlcNAc signals. p, NOESY expansion 
detailing the correlations of the 8-GlcNAc anomeric protons: GlcNAc ‘b*’ 
differs from unit ‘b, which has the same anomeric proton chemical shift, 
but is linked to a different ribitol unit. All densities are labelled with the 
letters used in Supplementary Table 2. The density marked with an asterisk 
in m is consistent with ribitol glycosylated at O-4. 
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Extended Data Fig. 3 | Secondary structure of a TarP monomer and UDP-GIcNAc (grey mesh, contoured at 2.00) and Mn?+ (magenta mesh, at 


interactions with UDP-GlcNAc. a, Cartoon representation of a TarP 3.00) in the TarP-UDP-GlcNAc-Mn”* complex structure. UDP-GlcNAc 
monomer bound to UDP-GIcNAc (yellow) and Mn?* (lime green). The and Mn** are coloured as in a. e, Simulated-annealing (mF, — DF,) omit 
CTD is coloured red. b, Interactions of TarP with UDP-GlcNAc and Mn**, — map of UDP-GIcNAc (grey mesh, at 2.00) and Mg’+ (blue mesh, at 2.0c) 
coloured as in a. Hydrogen bonds and salt bridges are shown as black in the TarP-UDP-GlcNAc-Mg”* complex structure. UDP-GlcNAc and 
dashed lines. ¢, Interactions of TarP with UDP-GIcNAc (yellow) Mg?" are coloured as in c. 


and Mg?* (magenta). d, Simulated-annealing (mF, — DF,) omit map of 
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Extended Data Fig. 4 | Simulated-annealing (mF, — DF.) omit maps 
of 3RboP and UDP-GIcNAc, and characterization of TarP mutant 
proteins. a, Chemical structures of synthetic 3RboP and 6RboP- 
(CH2)sNH). The unit numbers are indicated. b, Simulated-annealing 
(mF, — DF.) omit map of 3RboP (lime green) in the binary structure 
(magenta mesh, contoured at 2.00). c, Simulated-annealing (mF, — DF.) 
omit map of UDP-GlcNAc (yellow), Mg’* (magenta) and 3RboP (lime 
green) in the ternary complex structure (red mesh, at 1.80, blue mesh, 
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at 2.00 or magenta mesh, at 1.50). d, Circular dichroism spectra of wild- 
type and mutant TarP proteins (for wild type, R76A and D181A, n= 3; 
for D92A, Y152A and R259A, n= 2). e, Size-exclusion chromatography 
elution profiles of wild-type and mutant TarP proteins (for wild type, 
n= 8; for R76A, D181A and R259A, n= 3; for D92A and Y152A, n=2, 
all with similar results). Mutant proteins D94A, E180A, D209A, K255A, 
R262A, and H263A showed similar circular dichroism spectra and size- 
exclusion chromatography elution profiles (data not shown). 
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Extended Data Fig. 5 | Gating strategy for flow cytometry experiments. b, Gating strategy for phagocytosis experiments. Neutrophils were 
a, Gating strategy for IgG deposition experiments. To distinguish bacteria separated by Histopaque/Ficoll gradient and subsequent gating of 


from background signals, pure PBS was measured. Left, bacterial gating neutrophils occurred at the FSC/SCC density plot upon size and 

occurred at the FSC/SCC density plot omitting PBS-derived signals. complexity (left). Histopaque/Ficoll gradient isolations showed a 

Bacterial aggregates of high SSC and FSC values were excluded from the neutrophil purity of more than 80%. Using the CFSE-fluorescence channel, 
gated population as well. Right, the mean fluorescence of the bacterial the gated population was subdivided into fluorescence-positive and 
population (black) was determined and compared with non-IgG-treated -negative cells (right). Successful phagocytosis was indicated by uptake 
bacteria (grey) to control for nonspecific binding of the secondary FITC- of CFSE-labelled bacteria. The phagocytic efficiency was expressed as 
labelled antibody. Subsequently, mean fluorescence values of individual product of the mean fluorescence of the fluorescence-positive population 
mutants were compared relatively to the corresponding wild-type strain. and their relative abundance (mean fluorescence intensity, MFI). 
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Extended Data Table 1 | tarP presence and podophage susceptibility of CC5 strains, comprising sequence type (ST) 5 and 225, and CC398 


isolates 
Collection Rhine-Hesse collection Danish LA-MRSA collection MRSA collection Tuibingen 
Clonal complex 5 (ST5 + ST225) 398 5 (ST5 + ST225) 
tarP status Negative Positive Negative Positive Negative Positive 
n 21 39 18 12 11 53 
Phage susceptibility Susceptible Resistant Susceptible Resistant Susceptible Resistant 
44 21 39 18 12 ND ND 
66 21 39 18 12 ND ND 
P68 21 39 18 12 ND ND 


tarP presence in three different S. aureus collections was determined by PCR using primer pair TarP_Ty_Fw/Rv. Phage susceptibility to podophages ©44, 666, and ®P68 was determined by soft-agar 


overlay. Plaque formation indicated susceptibility, absence of visible plaque formation indicated resistance. ND, not determined. 
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Extended Data Table 2 | Enzymatic activities of mutated TarP proteins 


Function TarP variant Activity of wild type in % 
Trimer interface 1322E 128 
R76A 1 
UDP-GIcNAc binding D92A 2 
D94A 14 
D209A 105 
E180A 15 
Catalytic base D181A 1 
Y152A 44 
K255A 99 
3RboP binding R259A 3 
R262A 97 
H263A 81 
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Extended Data Table 3 | Donor substrate specificity of TarP 


Sugar nucleotide Enzymatic activity (nmol/mg*min) 
UDP-GIcNAc 2.20 
UDP-Glc 0.01 
UDP-GalNAc 0.03 
UDP-Gal 0.01 
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Extended Data Table 4 | Crystallographic data statistics for TarP and TarP-UDP-GIcNAc-Mg?*+ 


TarP native TarP-SeMet TarP-SeMet TarP-UDP-GIcNAc-Mg?* 
Data collection Peak Inflection 
Space group P21 P21 P21 P2, 
Cell dimensions 
a, b, c (A) 43.37, 95.25, 125.47 44.06, 95.33, 130.72 43.99, 95.22, 130.52 43.85, 95.27, 130.22 
a, B, y (°) 90.00, 96.57, 90.00 90.00, 93.41, 90.00 90.00, 93.34, 90.00 90.00, 93.49, 90.00 


Wavelength (A) 
Resolution (A) 
Rsym or Rmerge (%) 

I/ o(l)* 

CC12 (%) 
Completeness (%) 
Redundancy 


Phasing 
Rcullis (ano) 
Phasing power 
HA sites / ASU 
FOMacentric 


Refinement 
Resolution (A) 
No. reflections 
Rwork / Ritee (%) 
No. atoms 
Protein 
Substrates 
lons 
Other molecules 
Water 


Average B-factors (A?) 


Protein 
Substrates 
lons 
Other molecules 
Water 
R.m.s deviations** 
Bond lengths (A) 
Bond angles (°) 
Ramachandran plot 
Favored (%) 
Allowed (%) 
Outliers (%) 


1.00004 
44.5-1.86 (1.91-1.86) 
8.4 (87.7) 

9.4 (1.4) 
99.7 (50.0) 
98.5 (97.5) 

2.9 (2.7) 


44.5 - 1.86 
241855 (16740) 
17.1/21.8 


7538 
0 
13 


0.97941 0.97952 
47.7-2.29 (2.35-2.29) 47.7-2.30 (2.35-2.30) 
11.5 (103.8) 9.7 (62.2) 
13.8 (1.8) 15.8 (2.9) 
99.8 (64.0) 99.8 (81.9) 
99.0 (88.4) 99.2 (90.9) 
7.0 (6.5) 6.6 (6.0) 
0.76 
1.24 
26 
0.41 


Values in parentheses are for highest-resolution shell. Two data sets for TarP-SeMet were collected from the same single crystal. 
*/ is the mean intensity, o(/) is the standard deviation of reflection intensity /. 


**rm.s.d., root-mean-square deviation of bond length or bond angle. 
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0.91841 


47.6-1.95 (2.00-1.95) 


12.6 (110.1) 
9.2 (1.3) 
99.6 (50.6) 
99.9 (99.7) 
5.0 (5.0) 


47.6 - 1.95 
386853 (28878) 
17.7122.4 


7479 
117 
29 
24 
804 


35.5 
43.9 
44.6 
39.2 
41.0 


0.008 
1.254 


97 
3 
0 
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Extended Data Table 5 | Crystallographic data statistics for TarP-UDP-GlcNAc-Mn2", TarP-3RboP, TarP-6RboP-(CH2)gNH2 and 
TarP-UDP-GIcNAc-3RboP 


TarP-UDP-GIcNAc-Mn?* TarP-3RboP TarP-6RboP-(CH2)eNH2 TarP-UDP-GIcNAc-3RboP 

Data collection 
Space group P21 P21 P21 P2, 
Cell dimensions 

a, b, c (A) 43.86, 95.36, 130.55 95.61, 217.27, 123.99 95.41, 211.25, 122.68 95.17, 210.75, 123.20 

a, B, y (°) 90.00, 93.51, 90.00 90.00, 91.38, 90.00 90.00, 91.61, 90.00 90.00, 91.92, 90.00 
Wavelength (A) 0.91840 1.00000 1.00002 1.00002 
Resolution (A) 47.7-1.80 (1.85-1.80) 49.8-2.16 (2.22-2.18) 48.5-2.40 (2.46-2.40) 48.4-2.73 (2.80-2.73) 
Rsym Or Rmerge (%) 5.6 (101.0) 13.7 (140.9) 15.6 (141.2) 25.4 (161.1) 
1/ o(1)* 12.0 (1.3) 11.9 (1.5) 10.8 (1.5) 8.4 (1.4) 
CC12 (%) 99.9 (51.1) 99.8 (54.0) 99.6 (50.7) 99.0 (52.3) 
Completeness (%) 99.8 (99.5) 100.0 (100.0) 99.9 (100.0) 99.9 (99.8) 
Redundancy 3.6 (3.3) 7.0 (6.6) 6.2 (6.4) 7.1 (7.4) 
Refinement 
Resolution (A) 47.7 - 1.80 49.8-2.18 48.5 - 2.40 48.4 - 2.73 
No. reflections 355981 (24195) 1833608 (128618) 1172903 (89756) 911354 (69899) 
Rwork | Riree (%) 17.6/21.3 17.1/20.7 19.6/22.7 19.2/23.5 
No. atoms 

Protein 7,543 29,987 29,709 29,439 

Substrates 117 480 480 948 

lons 19 32 16 35 

Other molecules 12 18 

Water 739 2,694 1,555 1,383 
Average B-factors (A) 

Protein 37.6 46.1 $1.2 53.0 

Substrates 38.4 57.8 75.0 84.3 

lons 47.4 52.7 54.0 50.6 

Other molecules 46.6 49.7 

Water 43.7 49.4 48.6 41.4 
R.m.s deviations** 

Bond lengths (A) 0.010 0.009 0.008 0.010 

Bond angles (°) 1.331 1.288 1.214 1.302 
Ramachandran plot 

Favored (%) 98.0 97.0 96.8 96.4 

Allowed (%) 2.0 3.0 a2 3.6 

Outliers (% 0 0 0 0 


Values in parentheses are for highest-resolution shell. 
*/is the mean intensity, o(/) is the standard deviation of reflection intensity /. 
**rm.s.d., root-mean-square deviation of bond length or bond angle. 
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Cryptic connections illuminate pathogen 
transmission within community networks 


Joseph R. Hoyt!*, Kate E. Langwig?, J. Paul White*, Heather M. Kaarakka®, Jennifer A. Redell*, Allen Kurta‘, John E. DePue’, 
William H. Scullon®, Katy L. Parise”’®, Jeffrey T. Foster”’®, Winifred F. Frick’? & A. Marm Kilpatrick! 


Understanding host interactions that lead to pathogen transmission 
is fundamental to the prediction and control of epidemics’. 
Although the majority of transmissions often occurs within social 
groups®°, the contribution of connections that bridge groups and 
species to pathogen dynamics is poorly understood!*!”. These 
cryptic connections—which are often indirect or infrequent— 
provide transmission routes between otherwise disconnected 
individuals and may have a key role in large-scale outbreaks 
that span multiple populations or species. Here we quantify 
the importance of cryptic connections in disease dynamics 
by simultaneously characterizing social networks and tracing 
transmission dynamics of surrogate-pathogen epidemics through 
eight communities of bats. We then compared these data to the 
invasion of the fungal pathogen that causes white-nose syndrome, 
a recently emerged disease that is devastating North American 
bat populations!>-. We found that cryptic connections increased 
links between individuals and between species by an order of 
magnitude. Individuals were connected, on average, to less than 
two per cent of the population through direct contact and to only 
six per cent through shared groups. However, tracing surrogate- 
pathogen dynamics showed that each individual was connected to 
nearly fifteen per cent of the population, and revealed widespread 
transmission between solitarily roosting individuals as well as 
extensive contacts among species. Connections estimated from 
surrogate-pathogen epidemics, which include cryptic connections, 
explained three times as much variation in the transmission of the 
fungus that causes white-nose syndrome as did connections based 
on shared groups. These findings show how cryptic connections 
facilitate the community-wide spread of pathogens and can lead to 
explosive epidemics. 

Pathogens have repeatedly spilled over from wildlife to humans and 
caused epidemics at local to global scales'®. Social behaviour often con- 
strains the transmission of pathogens, with network structure increas- 
ing transmission within social groups” but limiting the spread to 
solitary individuals, other social groups, and between species®!!”, 
Contacts that bridge species and social groups are usually infrequent or 
indirect—for example, via the environment—but they can lead to unex- 
plained cases and outbreaks'* including the 2014 Ebola epidemic in 
West Africa!?, Nipah virus transmission in Bangladesh’, transmission 
of Mycobacterium bovis in badgers”® and Mycoplasma in tortoises”'. For 
white-nose syndrome, an emerging infectious disease of bats, impacts 
are exceedingly high in some species that roost solitarily'*, which raises 
the important question of how solitary individuals acquire infections. 

Cryptic connections are important because they can determine 
whether transmission is constrained to single-species, local outbreaks 
or spans multiple populations and species. However, the extent of 
cryptic connections among social groups and their influence on path- 
ogen dynamics is poorly known, in part, because of the difficulty in 
measuring these types of contacts. Prospective studies of contact rates 


often take place over short time periods, are usually designed to cap- 
ture direct connections among individuals and can miss infrequent 
or indirect connections*?*"4, Retrospective methods such as contact 
tracing’*”4 and model estimation of contact rates!” sometimes uncover 
cryptic connections, but determining the general importance of these 
connections in transmission is often limited by epidemic-specific 
details”>. 

Here we examine how cryptic connections between social groups 
and species influence pathogen transmission in replicated epidemics 
in multiple populations of hibernating bats. As with many other spe- 
cies”®”®, hibernating bats spend the majority of their time in relatively 
small groups!*. However, individual bats arouse from hibernation for 
short periods (1-3 h every 2-3 weeks, or <0.5% of winter?”) and may 
leave these groups and contact individuals in other groups or areas 
of the environment'*. North American hibernating bats have recently 
been affected by white-nose syndrome, a disease caused by the fungal 
pathogen Pseudogymnoascus destructans'>8, We hypothesized that 
cryptic connections among species and individuals may be important 
in the transmission dynamics of white-nose syndrome, because the 
spread of the pathogen that causes this syndrome occurs rapidly within 
multiple species and some solitarily roosting species have suffered steep 
declines". More generally, understanding the contribution of contacts 
outside social groups in pathogen transmission remains an important 
question in epidemiology. 

We characterized connections among individuals using three meth- 
ods to compare networks with and without cryptic contacts. We first 
measured direct physical contacts among bats, as well as links through 
membership of shared groups (a group of bats hibernating together), 
at eight sites at which bat communities were comprised of four species 
(Supplementary Table 1). For both measures, we included the addi- 
tional connections bats made when switching between groups over 
the winter during periodic arousals from hibernation (see Methods). 
We simultaneously quantified the importance of cryptic connections 
in these communities by tracing epidemics of a surrogate pathogen, 
ultraviolet-fluorescent (UVF) dust (Extended Data Fig. 1), through 
each population. We estimated connections through physical contact, 
shared groups and UVF dust as the percentage of each species at each 
site that a given focal bat was connected to over the winter. By rep- 
licating epidemics within each population, we were able to account 
for variation in epidemic outcomes due to differences among indi- 
viduals and species”. The fungus that causes white-nose syndrome 
subsequently invaded all eight sites at which we studied UVF-dust 
epidemics (Extended Data Fig. 2a—h). This enabled us to compare the 
transmission of P destructans to connections within each community, 
as measured through UVF dust and shared group connections. 

For hibernating bats, networks created from physical contact and 
shared groups substantially underestimated connections within 
populations, compared to epidemic networks of UVF dust (Fig. 1, 
Extended Data Fig. 3). Connections within species revealed by the 
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Fig. 1 | Observed physical contact and surrogate-pathogen epidemic 
networks for four communities of hibernating bats. a~h, Each row 
shows two networks—one based on physical contact and one based on the 
spread of UVF dust—for one of four sites (labelled WI-SP, WI-SJ, MI-BC 
and MI-MC). Each circle (node) represents an individual bat, and colour 
indicates species. Larger numbered circles (1-7) are bats that were UVF- 
dusted with unique colours in early winter. Lines (edges) between nodes 
indicate physical contact among bats in shared groups at the end of winter 
(a—-d) or UVF dust that originated from a UVF-dusted bat (large circle) in 
epidemic networks (e-h). Each UVF-dust epidemic network in e-h shows 
seven simultaneous replicate epidemics, with each epidemic originating 
from a single, numbered UVF-dusted bat. For example, in d and h the 
arrow indicates a M. septentrionalis bat roosting solitarily (d) that was 
re-sighted with UVF dust from four different epidemics (from bats 1, 3, 4 
and 7; h). Locations of nodes (bats) in the two networks are identical, so 
the spread of UVF dust within and between clusters of touching bats can 
be visualized in the UVF-dust networks. 


transmission of UVF dust were threefold higher among individuals 
than was apparent from direct physical contact or shared groups for two 
of the three focal species (Myotis lucifugus and Myotis septentrionalis; 
Fig. 2, Extended Data Fig. 4, Supplementary Tables 2-4). Over 92% 
of M. septentrionalis individuals roosted alone (average group size of 
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1.34 + 0.068 individuals; range 1-3; Extended Data Fig. 5) and were 
connected to only 7.3 + 0.2% of other M. septentrionalis individuals by 
physical contact and 8.1 + 0.2% by shared groups (Fig. 2b, left and mid- 
dle, red columns; Supplementary Table 5, line 19). Despite the solitary 
roosting behaviour of this species, each M. septentrionalis individual 
was connected to 24.7 + 4.3% of other M. septentrionalis individuals 
at each site by UVF dust (Fig. 2b, right, red column; Supplementary 
Table 5, lines 26, 27). For the most gregarious species—M. lucifugus 
(average group size 3.25 + 0.34 individuals, range 1-53; Fig. 1, Extended 
Data Figs. 3, 5)—individuals were only in direct physical contact with 
5.0 0.1%, and in shared groups with 20.0 + 0.2%, of the M. lucifugus 
population at each site (Fig. 2a, left and middle, light-blue columns; 
Supplementary Table 6, line 12), whereas UVF dust originating from 
each individual bat was transmitted to 28.6 + 5.5% of the M. lucifugus 
individuals at each site (Fig. 2a, right, light-blue starred column, 
Supplementary Table 6, lines 1, 2). By contrast, for Perimyotis subflavus— 
another solitary species (99% of individuals roosted individually, 
n= 62; Fig. 1, Extended Data Figs. 3, 6)—there were few additional 
connections among individuals as detected by the UVF dust, when com- 
pared with physical contact or shared groups (Fig. 2c, orange columns; 
Supplementary Table 7). 

The extent of environmental contamination with UVF dust for P 
subflavus (533 + 163 mm?) was not substantially lower than for the 
other species (929 + 245 mm? and 1,100 £170 mm? for M. lucifugus 
and M. septentrionalis, respectively) and therefore was not sufficient to 
explain differences in connections (Extended Data Fig. 7a). However, 
the spatial overlap of UVF-dust colours in the environment was sig- 
nificantly lower for P. subflavus individuals (63.3 + 5.8%) compared 
to M. septentrionalis and M. lucifugus (86.2 + 3.7% and 86.8 + 3.7%, 
respectively; Extended Data Fig. 7b), which suggests that spatial segre- 
gation may limit both indirect and direct transmission for P. subflavus. 

Connections between species were underestimated even more by 
measurements of physical contact and shared groups than were con- 
nections within species (for example, see Fig. 2a). Overall, the spillover 
of UVF dust among species was 17- and 12-fold higher than expected 
based on physical contact and shared groups, respectively (Fig. 2). For 
example, M. lucifugus was—on average—in direct contact with only 
0.0-0.1% of the M. septentrionalis, P. subflavus or Eptesicus fuscus pop- 
ulations (Fig. 2b; Supplementary Table 6, lines 14, 17, 20), and was only 
in shared groups with 0.0-1.7% of individuals for these three species 
(Fig. 2b; Supplementary Table 6, lines 24, 27, 30). However, inclusion 
of cryptic connections revealed through UVF-dust epidemics showed 
that each individual M. lucifugus was connected to an average of 22% 
of E. fuscus, 14% of M. septentrionalis and 1% of P. subflavus individuals 
at each site (Figs. 1, 2, Extended Data Fig. 3; Supplementary Tables 2, 6, 
lines 3, 6, 9). 

For M. lucifugus, group size affected among-species and with- 
in-species transmission differently. The probability of an M. lucifu- 
gus individual having UVF dust from another M. lucifugus individual 
increased with group size, whereas UVF dust from M. septentrionalis 
was more likely to be found on solitary-roosting M. lucifugus individ- 
uals (Extended Data Fig. 8). This suggests that transmission of UVF 
dust within a site was density-dependent for M. lucifugus, and that M. 
septentrionalis—which are solitary—preferentially transmit UVF dust 
to solitary-roosting M. lucifugus. 

The UVF-dust epidemics suggested that P destructans, the fungus 
that causes white-nose syndrome, would spread rapidly through pop- 
ulations of some species of bats but not others (Fig. 2), and the subse- 
quent invasion of P. destructans supported this prediction. Prevalence of 
P. destructans on M. lucifugus (Fig. 3a) and M. septentrionalis individu- 
als (Fig. 3b) increased rapidly over the winter hibernation period (mean 
84%, range 55-100%) at all sites. By contrast, transmission was much 
lower in P. subflavus; prevalence at three of the four sites remained 
below 25% at the end of winter (Fig. 3c). Across all species, UVF-dust 
transmission explained 3.5 times as much variation in P destructans 
transmission as connections estimated using shared groups, and the 
similarity in transmission patterns between P. destructans and UVF 
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Fig. 2 | Contact, group and UVF-dust epidemic connections among 
individuals. Per cent of individuals of each species at each site observed in 
physical contact, in a shared group or with UVF dust that originated from a 
single focal individual. a, M. lucifugus (n= 3,362 physical contact estimates, 
n=3,345 shared group estimates, n = 74 UVF-dust estimates). 

b, M. septentrionalis (n= 413 physical contact estimates, n = 426 shared group 
estimates, n=51 UVF-dust estimates). ¢, P subflavus (n= 204 physical contact 
estimates, n = 199 shared group estimates, n = 45 UVF-dust estimates). Each 
point indicates the percentage of the population of a single species at a site that is 
connected to a single focal individual. For example, the arrow in the right section 
of a shows the percentage of M. septentrionalis individuals at each site with each 
colour of UVF dust that originated from a UVF-dusted M. lucifugus. The arrow 
in the middle section of a shows the percentage of M. septentrionalis individuals 
at each site that were in a shared group with each individual M. lucifugus over 
the winter. Connection estimates for physical contact and shared social groups 
include connections that the bats made when switching positions in the network 
during arousals (see Methods). Letters above bars indicate groups for which 
95% high posterior density intervals for pairwise differences did not include 0 
(Supplementary Tables 5-7), and stars indicate within-species connections. The 
lower and upper hinges of the box plots show the first and third quartiles, and 
the black lines indicate the mean. The upper and lower whiskers extend to the 
largest and smallest value 1.5 times the interquartile range. 


712 | NATURE | VOL 563 | 29 NOVEMBER 2018 


@MI-BC *MI-MC #WI-SB- WI-SP 
@MI-GA @«MI-TM @WI-SJ- © WI-ST 


Perimyotis subflavus 


Myotis lucifugus 
a b c 


Myotis septentrionalis 


# 


0.4 


o 


P. destructans prevalence 


Nov Dec Jan Feb Mar Nov Dec Jan Feb Mar Nov Dec Jan Feb Mar 


Month 


Myotis lucifugus @ Myotis septentrionalis | @ Perimyotis subflavus 


e 2 ; 2 
ole R? = 0.66 R? = 0.66 


0.00 0.25 0.50 0.75 1.00 


0.00 0.25 0.50 0.75 1.00 


Change in pathogen prevalence 
o 9 
aoa 
e 
e 
) 
hange in pathogen prevalence 
o 9 
aoa 


Change in shared group connections Change in UVF-dust prevalence 


Fig. 3 | Pathogen and surrogate-pathogen transmission dynamics. 
Pathogen (P. destructans) prevalence over time during the first year of 
invasion, at the eight sites at which UVF-dust epidemics were studied for three 
species. a, M. lucifugus (n =207 bats sampled). b, M. septentrionalis (n= 116 
bats sampled). ¢, PR subflavus (n= 125 bats sampled). Points show pathogen 
prevalence (+ 1 s.e.m.) for a species at a site and lines show fitted models for 
the change in pathogen prevalence over time; generalized linear mixed-effects 
model, binomial distribution with logit link and site as a random effect 

(a, M. lucifugus, —3.71 (0.80) + 1.43 (40.20) x month, P=3.14 x 1073; 

b, M. septentrionalis, —1.14 (£0.20) + 0.41 (£0.27) x month, 

P=1.11x 10-”; and ¢, P. subflavus, 0.50 (+ 1.18) + —0.75 (+£0.30) x month, 
P=0.003). d, Pathogen transmission (transmission = (PdMarch — PdNovember)/ 
(1 — Pdyovember), in which Pd is P. destructans prevalence) plotted against 

the change in shared-group connections, including movement between 
groups over winter (n= 15 species-site estimates). The curves show 

the nonlinear-mixed-model fitted relationship, with site as a random 

effect (pathogen transmission = 1/(1 + e193 (+ 1.52) x (change in shared-group 
connections — 0.03(+0.29)))), p— 0,19), e, Pathogen transmission plotted against 
UVF-dust transmission (n = 15 species-site estimates). The line shows the 
fitted linear-mixed-model relationship, with site as a random effect, and the 
grey region shows the 95% confidence interval of the regression (pathogen 
transmission = 0.13 (40.12) + 0.825 (40.15) x UVF-dust transmission, 
P=0.0006). The dashed line shows the 1:1 line for comparison. All P values 
are two-tailed. 


dust demonstrates the importance of cryptic connections in driving 
epidemics (Fig. 3d, e). 

Observing cryptic connections among individuals and groups via 
UVF-dust epidemics provides insight into the transmission patterns 
of PB. destructans, which are difficult to understand from patterns of 
contact among species. M. septentrionalis and P. subflavus were both 
primarily observed as solitary-roosting individuals, which should limit 
transmission in both species relative to M. lucifugus, which roosts in 
larger groups (Extended Data Fig. 5). Although transmission rates 
in P subflavus were indeed lower, transmission of P. destructans was 
equally high in M. septentrionalis and M. lucifugus (Fig. 3a, b). The 
UVF-dust epidemics suggested that this was because extensive cryptic 
connections link M. septentrionalis individuals with each other and to 
M. lucifugus. By contrast, P. subflavus had few observable or cryptic 
connections, and the lack of connections resulted in much lower rates 
of transmission. These results show that cryptic connections were not 
simply proportional to observed contacts, but vary among species and 
individuals. Cryptic connections uncovered by these data can also 
inform conservation actions that aim to reduce the effect of white- 
nose syndrome on bat populations (see ‘Conservation implications’ 
in Supplementary Information). 
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Characterizing all connections among individuals is vital to under- 
standing, predicting and controlling outbreaks of infectious dis- 
eases. We have shown that cryptic connections among groups 
and species have a key role in transmission dynamics and explain why 
transmission is far more intense in some species than others. Cryptic 
connections not only link social groups within species but also create 
bridges among species, resulting in highly connected communities and 
explosive epidemics. Characterizing the locations, frequencies and var- 
iation in cryptic connections is needed to more accurately predict and 
prevent future epidemics. 


Online content 

Any methods, additional references, Nature Research reporting summaries, source 
data, statements of data availability and associated accession codes are available at 
https://doi.org/10.1038/s41586-018-0720-z. 
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METHODS 


Study sites and sampling for P. destructans. We studied contact rates, social 
networks and fungal infection with P destructans in bats at eight abandoned 
mines in Wisconsin and Michigan over five winters (Extended Data Fig. 2a—h, 
Supplementary Table 1; winters of 2012/2013-2016/2017). Sampling to measure 
pathogen transmission was conducted over all five winters and surrogate-path- 
ogen tracing (UVF dust) occurred over two winters (2013/2014 and 2014/2015; 
Extended Data Fig. 2a—h). There were 183 +72 (range 18-624) total bats of three 
or four species hibernating at each site (Supplementary Table 1). Bats in this region 
begin hibernating in September—-October and leave hibernacula in April-May. 
We visited each site twice during each winter in all five years. In November and 
March, we counted all bats by species and sampled bats to test for the presence 
of P. destructans, the fungus that causes white-nose syndrome (Extended Data 
Fig. 2a—h). We sampled up to 20 individuals of each species for P. destructans, using 
a previously described swab-sampling technique during each visit*!**. Samples 
were placed in RNAlater (Thermo Fisher Scientific) and subsequently tested for 
P. destructans by qPCR*!*3. A power analysis was used to predetermine sample 
size for pathogen data collection. At six sites, P destructans invasion occurred after 
the UVF-dust study was conducted (1-3 winters later; Extended Data Fig. 2). At 
two sites, P. destructans was detected during the UVF-dust study (Extended Data 
Fig. 2b, c). However, there were no observable declines over the winter at these 
two sites and observable white-nose syndrome symptoms were largely absent in 
the first year, as has been observed at other sites!>, 

Quantifying direct contact, shared groups and UVF-dust transmission. We 
quantified connections among hibernating bats in three ways: (1) bats that were 
physically touching each other, (2) bats hibernating in groups (a group of bats 
touching each other) and (3) bats sharing a surrogate pathogen (a unique colour 
of UVF dust). The first two measures of social interactions—physical contact and 
shared social groups—are commonly used to characterize human, livestock and 
wildlife communities’. Transmission of UVF dust mirrors the transmission ecol- 
ogy of P. destructans, which is an epidermal fungal pathogen that is transmitted by 
direct or indirect contact among bats and the environment. 

In November, we applied a unique colour of UVF dust randomly to each of 
4-7 individual male bats at each site (with one exception; see below). To deter- 
mine potential sample sizes for field experiments, we assessed the ability of four 
independent observers to distinguish and re-sight colours of UVF dust. Both large 
patches and trace amounts of dust were spread on rock surfaces and then examined 
to determine which colours could be distinguished from each other. We initially 
compared 10 colours of UVF dust and found that with a combination of UV and 
visible light, seven could be reliably distinguished from each other. The seven 
unique colours included in the study were: ECO-11 Aurora Pink, ECO-15 Blaze 
Orange, 16 Arc Yellow, ECO-17 Saturn Yellow, ECO-18 Signal Green, ECO-19 
Horizon Blue (Day-Glo Colour); and DFSB-C0 Clear Blue (Risk Reactor). 

We randomly applied one gram of a single colour of UVF dust to each individ- 
ual bat. Each bat was dusted over a clean bag to ensure that there was no transfer 
of UVF-dust colours between bats (Extended Data Fig. 1a). The UVF dust was 
applied by spreading the dust along the entire dorsal and ventral surface of the bat, 
and limiting the application of dust to the face and head (Extended Data Fig. 1b, 
c). We used the same quantity of dust for all three species because they broadly 
overlap in body size (M. lucifugus: 222-269 mm (wing span), 60-102 mm (length); 
M. septentrionalis: 230-260 mm (wing span), 78 mm (length); and P. subflavus; 
220-250 mm (wing span), 77-89 mm (length)*4-°°). We also placed a unique wing 
band on each bat for identification. 

We returned to each site during March and searched the environment and 
inspected all bats for UVF dust using UV flashlights (395 nm; Hayward) and visible 
light. The high reflectance of the UVF dust made dusted areas in the environment 
and on bats easily visible. We carried a colour key that consisted of small amounts 
of the seven colours of UVF dust fixed to black construction paper to help to con- 
firm colour identification. Each colour of UVF dust observed on each bat was 
recorded (Extended Data Fig. 1d, e) and observers were blinded during resighting 
to the original species colour combination at each site. To aid in identification of 
the colour of small patches of dust, a piece of clear tape was used to collect a sample 
of the dust. This tape was affixed to a plastic microscope slide, and analysed under 
a dissecting microscope in the laboratory to confirm the colour. The number of 
individuals detected with each UVF-dust colour was used as an estimate of the 
total epidemic size resulting from a single (dusted) individual and includes both 
direct bat-to-bat contacts in and outside shared hibernating groups, and indirect 
bat-environment-bat contacts. 

We also quantified the area of each patch of UVF dust, by colour, in the hiber- 
nacula environment to the nearest square centimetre (Extended Data Fig. 1f) and 
mapped all environmental contacts to the nearest metre location using a metre 
tape. For sites with branching passages or passages wider than 2 m, we indicated 
the location of each environmental contact on a map of the hibernaculum. A 
team of 2-3 people visually inspected the entire surface area of the site using UV 


flashlights. Each patch of UVF dust (Extended Data Fig. 1f) was measured using a 
fine-scale (mm) ruler to estimate the size of each patch of UVF dust. If there was 
any uncertainty in the colour of a patch of dust, we collected a sample of the dust 
as described above. 

We dusted individuals of three species of bats, M. lucifugus, M. septentrionalis 
and P. subflavus. We refer to M. lucifugus as the more social species because it 
often (>50%) occurred in groups of more than one bat. By contrast, over 90% of 
individuals of the two other species roosted alone, and thus we refer to them as 
solitary. A fourth species (E. fuscus) was present at six of the eight sites. We did 
not dust individuals of this species, but we did inspect them for UVF dust. At six 
sites, we dusted three individuals of one species and four individuals of another 
(Supplementary Table 1). At one site, we only dusted four individuals of P subflavus 
because the abundance of other species was too low (Supplementary Table 1). At 
an eighth site, we dusted five M. septentrionalis with five unique colours and five 
M. lucifugus with a single colour. For analyses for this site, we divided the total 
number of individuals receiving dust from M. lucifugus by five and included this 
estimate as a single data point in relevant analyses (Fig. 2, Supplementary Table 1). 

To quantify direct connections among bats, we recorded groups of bats in phy- 
sical contact and in shared groups in March. Bats spend ~99.3% of winter in these 
groups, making them similar to social groups in other systems’. However, bats 
do sometimes switch among groups during winter. To quantify direct connections 
among bats via shared groups, we performed simulations to estimate the number 
of direct contacts and shared connections when the bats periodically aroused from 
hibernation in between our visits to the site in November and March. We used 
several datasets to examine variation in group sizes, cluster mate fidelity within a 
winter and UVF-dust transmissibility to help to guide simulations. 

First, we investigated whether the group size (clustering behaviour) of hiber- 
nating bats differed over the winter period. We analysed data collected over a 
five-year period on the group sizes of the four species of bats mentioned above. 
We compared social-group size between early and late winter using generalized 
linear mixed-effects models with a Poisson distribution and a log-link. We found 
no significant difference in behaviour, range and frequency of group size, over the 
winter across the three UVF-dusted species (Extended Data Fig. 6a). Group sizes 
of E. fuscus slightly increased over the winter, but the probability of roosting with 
other species did not change (logistic regression with site as a random effect: early 
winter, intercept: —3.23 + 0.59; late winter, coefficient —0.66 + 0.93, P=0.476) 
and within-species transmission was not estimated for E. fuscus, because we did 
not apply UVF dust to this species. Our analyses of group-size data also indicated 
that there were no significant differences in roosting behaviour between the two 
winters during which the UVF-dust data were collected and the winters during 
which the pathogen data were collected (Extended Data Fig. 6b), which suggests 
that group sizes were relatively consistent over time before populations declined 
from white-nose syndrome". 

Second, we examined the probability that bats were found in groups with 
the same bats in November and March using data from individually banded 
bats observed within a single winter. These data indicate that for 24 pairs of 
bats that roosted in a group together in November, only 2 pairs were found 
together in March (despite both members of the pair being observed in both 
November and March). Nine of these bats became solitary between November 
and March, and ten individuals roosting solitarily in early winter were found in 
shared groups in March. These data suggest that only some bats remain with 
the same roost-mates over winter, and that many bats re-assort among groups 
during hibernation. 

Finally, we conducted an experiment to assess the number of transmission 
chains through which UVF dust spreads (for example, primary, secondary trans- 
mission and so on). We initially captured 18 M. lucifugus individuals. We dusted 
three bats and placed them individually in 38 x 38 x 61-m mesh terrariums 
(Restcloud, via Amazon). After one hour, we added a single individual male 
M. lucifugus to each enclosure with a single dusted bat to examine transmission 
from a dusted bat to a primary contact. After 60 s, we removed the primary-con- 
tacted bats and inspected them for UVF dust, as described above. We then placed 
the primary-contacted bats individually into clean cages. After 60 min, we placed 
one non-dusted individual male bat into each of the enclosures with each primary 
contacted bat for 60 s, to examine secondary transmission. We repeated this pro- 
cess for dusted bats, primary bats and secondary bats four hours later, for a total 
of nine measurements of transmission between primary-contacted bats and sec- 
ondary-contacted bats, and six measurements of transmission between originally 
dusted bats and primary bats. We found that all of our primary contacted bats 
(6 out of 6) had UVF dust immediately following interaction with the originally 
UVF-dusted bats, and this dust remained visible for the duration of the experi- 
ment (6 h). Three of the nine secondary-contacted bats initially had UVF dust 
after their interactions with the primary-contacted bats, but none of these three 
bats retained the UVF dust 2-5 h later (Supplementary Table 8; UVF dust was 
probably removed by bats grooming the small amounts of dust they obtained 
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during secondary contact). This experiment suggests that UVF dust observed 
on bats in March most probably came from primary contacts between originally 
dusted bats or from the environment, and not from secondary or tertiary contacts 
among individuals. 

We used these data to guide simulations to estimate the number of bats at each 
site with which a focal bat was likely to have shared direct physical and social-group 
connections, between our visits to the site. We used the observed contact networks 
of bats observed in March (Fig. 1, Extended Data Fig. 3), and randomly rearranged 
bats within each site while maintaining the same distribution of connections for 
each rearrangement using R packages ‘igraph ’(http://igraph.org) and ‘picante’*’, 
keeping track of all primary connections. We performed six rearrangements, 
which reflects the average arousal frequency of hibernating bats (16 days*®*) and 
the duration between our visits to examine the spread of the UVF dust (90 days; 
range: 88-93 days), resulting in approximately 5.6 rearrangements over our study. 
Each site adjacency matrix was divided into within- and among-species blocks for 
each species combination (M. lucifugus-M. lucifugus (block 1), M. lucifugus-M. 
septentrionalis (block 2) and so on). We randomized within each block six times, 
maintaining the total number of edges or connections for each combination from 
the original social network. Symmetry was maintained for within-species connec- 
tions in the matrix. We allowed each individual to contact the same individuals or 
return to the same position in the network. We then calculated the total number 
of unique direct connections per individual by summing across all six randomly 
generated matrices or rearrangements. 

We performed simulations for shared-group rearrangement in a similar manner. 
Individuals were randomly rearranged among positions held by the same species 
within the network, while maintaining the edge distribution within blocks and 
including the possibility of an individual remaining a member of the same group. 
We summed the unique number of individuals that a focal individual was con- 
nected to over the six rearrangements. 

We divided the total number of connections for each individual, made through 

physical contacts and social-group sharing, by the total population of that species at 
a site to calculate the percentage of each species connected to a focal individual. The 
percentages represent the total direct connections made by bats over the winter. A 
comparison of connections from the snapshot data to connections estimated using 
six rearrangements is provided in Extended Data Fig. 4. 
Analyses. We quantified the percentage of bats of each species at a site that were 
connected to each focal bat either through physical contact, shared social group 
membership or sharing a colour of UVF dust. For the connection estimates using 
the first two measures (physical contact and shared groups), every bat in the site 
served as a focal bat and all other bats were Bernoulli trials in a binomial sam- 
ple. For UVF-dust connections, each bat originally dusted in November served 
as a focal bat, and all other bats were a Bernoulli trial in a binomial sample. We 
compared the number of connections among connection types (physical con- 
tact, shared group member and UVF dust) and species (including within- versus 
between-species) by fitting a binomial hierarchical model with Bayesian methods 
using the no-U-turn sampler, an extension of Hamiltonian Markov chain Monte 
Carlo, with site as a random effect. We fit models and estimated the posterior distri- 
butions for all parameters using the R package ‘brms™°, which uses the programing 
language Stan via the ‘rstan’ package (https://cran.r-project.org/web/packages/ 
rstan/index.html). We included weakly informative priors for all parameters (nor- 
mal distribution with mean of zero and standard deviation of five). We ran a total 
of 4 chains for 2,000 iterations each, with a burn-in period of 1,000 iterations per 
chain resulting in 4,000 posterior samples, which—given the more efficient no-U- 
turn sampler*°—was sufficient to achieve adequate mixing and convergence. All R* 
values were less than or equal to 1.01 indicating model convergence. 

We assessed differences in connections between species and by connection type 
(direct contact, shared social group and UVF dust) by determining whether the 
95% credible intervals for pairwise comparisons between any two groups included 
zero (for example, physical contact between M. lucifugus and P. subflavus and UVF- 
dust connections between M. lucifugus and M. septentrionalis). We used Bayesian 
methods in this analysis to account for issues of complete separation for some 
species and data-type combinations. For example, P. subflavus was never in phys- 
ical contact with M. septentrionalis, and the parameter estimating the difference 
between this binomial probability in a logistic regression and one in which there 
are some connections or non-zero values (for example, M. lucifugus in contact 
with M. lucifugus) would be undefined in a frequentist analysis. The inclusion of 
(weak) prior information in the Bayesian approach makes it possible to handle 
these complete separation issues. 

We assessed potential biases in re-sighting different colours of UVF dust by 
comparing models including both site and UVF-dust colour as random effects to 
models with only site as a random effect, using Bayesian methods identical to those 
described above. These models were only fit to the UVF-dust connection dataset 
because differences between UVF-dust colours do not apply to other connection 
types. The coefficient estimates and credible intervals for each species connection 
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were similar among analyses including and excluding colour as a random effect 
(Supplementary Table 9). 

We also examined whether the probability of having UVF dust increased with 
group size for M. lucifugus. For each origin species (M. lucifugus, M. septentrionalis 
and P. subflavus), we examined correlations between M. lucifugus cluster size and 
the probability of having UVF dust from any dusted individual, using generalized 
linear mixed-effects models with a binomial distribution and a logit link with site 
and individual bat as random effects, and group size as a fixed effect, using the 
‘Ime4’ package*!. Individual bat was included as a random effect because some 
individuals were re-sighted with multiple colours of UVF dust, and thus are rep- 
licated in the analyses. One site (MI-GA) was excluded from the within-species 
M. lucifugus analysis because five individuals were dusted with the same colour 
at this site. 

We examined differences among species in the probability of becoming infected 
with P. destructans over the winter at the same sites, using a generalized linear 
mixed-effects model with a binomial distribution and logit link, with species inter- 
acting with date as fixed effects, and site as a random effect. We examined the 
correlation between P. destructans transmission and both UVF-dust transmission 
and shared-group connections. We regressed the change in P. destructans prev- 
alence over the winter during the first year of fungal invasion on the change in 
UVF-dust prevalence over the winter, or the change in shared-group connections 
over the winter for the same dusted individuals at each site using linear and non- 
linear mixed-effects models with site as a random effect using the ‘nlme’ package”. 

We examined differences among species in the total surface area of the environ- 
ment covered with UVF dust, using a linear mixed-effects model with species as 
a fixed effect, and site as a random effect. Finally, we examined the differences in 
spatial segregation within hibernacula for each bat originally dusted in November. 
The ‘home range’ or area within each hibernaculum used by an individual was 
calculated by using locations in the environment in which UVF dust from each of 
our originally dusted bats was detected, as described above. We estimated home 
range overlap for each focal bat by computing the proportion of the home range 
of one focal bat that is overlapped by another focal bat using the ‘kerneloverlap’ 
function in the R package ‘adehabitatHR’®. This function uses the kernel utilization 
distribution, which is a bivariate function giving the probability density that an 
animal is found at a point according to its geographical coordinates. We removed 
10% of the outlying points, and restricted analyses to individuals with >10 dust 
locations detected in each site'®. To estimate a kernel utilization distribution for 
sites at which dust locations were taken with a single coordinate (length along 
the mine tunnel), we created an additional coordinate location (tunnel width) by 
drawing 100 times from a uniform distribution between 0 and 2 (the approximate 
width of the mine tunnel, 2 m), and averaging percentage overlap values for each 
individual. We then compared the percentage overlap of areas used by each spe- 
cies using a linear mixed-effects model with species as a fixed effect, and site as a 
random effect. All statistical tests were carried out using R 3.3.2. 

We complied with all relevant ethical regulations and all work was approved and 
performed under protocol FrickW1106, approved by the University of California, 
Santa Cruz IACUC. 

Code availability. Supporting code and the comma separated value data files are 
available via GitHub at https://github.com/hoytjosephr/cryptic-connections.git. 
Reporting summary. Further information on research design is available in 
the Nature Research Reporting Summary linked to this paper. 


Data availability 

All raw data points are contained within the main-text Figs. and Extended Data 
Figs. All other data are available from the corresponding author upon reasonable 
request. 
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Extended Data Fig. 1 | Application and re-sighting of UVF dust on bats 
and in the environment. a, Applying UVF dust to P. subflavus. b, A dusted 
P. subflavus shortly after release. c, M. septentrionalis after UVF-dust 


Each bat in the group had UVF dust. e, A dusted M. lucifugus in a group of 
application. d, A group of eight M. lucifugus roosting together in March. 


three bats during March. At least three of the bats in the photo can be seen 


with UVF dust. f, Location in the environment with patches of UVF dust 
in March next to a hibernating P. subflavus. 
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Extended Data Fig. 2 | Trends of population counts and timing of 
UVF-dust and disease data collection. The grey bar shows the winter 


period when the UVF-dust study was conducted and the red bar shows the 


year the P. destructans arrived. Temporal patterns of social group size are 
provided in Extended Data Fig. 6a. 
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Extended Data Fig. 3 | Social and epidemic networks for four additional 
communities of hibernating bats. a~h, Each row shows two networks 
(physical contact and UVF dust) for each of four sites (WI-ST, WI-SB, 
MI-TM and MI-GA). Each circle (node) represents an individual bat, 
with colour indicating species; larger numbered circles (1-7) are bats that 
were dusted with UVF dust with unique colours in November at each 
site. Lines (edges) between nodes indicate physical contact among bats in 
social networks (a-d) or UVF dust that originated from the UVF-dusted 
bat (large circle) in epidemic networks (e-h). Each UVF-dust epidemic 
network in e-h shows seven replicate epidemics, with each epidemic 
originating from a single, numbered UVF-dusted bat. Locations of nodes 
(bats) in the two networks are identical, so the spread of UVF dust within 
and between groups of touching bats can be visualized in the UVF-dust 
network. 
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Extended Data Fig. 4 | Comparison of observed data from late-winter 
and total shared-group connections including group rearrangements. 
The y axes show the percentage of individuals of each species at each 

site observed in direct physical contact, in shared groups or with 

UVF dust originating from a single focal individual. a, M. lucifugus. 

b, M. septentrionalis. c, P. subflavus. Small points show the observed 

(grey) data and augmented (coloured) connections, including group 
rearrangements. The black open points show the mean late winter estimate 
of direct contact and social-group connections based on a single time 


point of the populations without group switching (Fig. 1, Extended Data 
Fig. 3; left panels). The large coloured points (means in Fig. 2) show the 
mean number of connections from simulations in which bats rearranged 
during six arousals over the winter (also shown in Fig. 2). Where the 
large coloured points are within the black open points, there were no 
differences between the observed and augmented data including group 
rearrangements. The connections observed in UVF-dust epidemics are 
shown for comparison, but do not differ. 
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Extended Data Fig. 5 | Shared-group (cluster) size for four species of 
hibernating bats. Different letters above bars indicate mean group sizes 
that differ significantly; generalized linear model with negative binomial 
distribution with site as a random effect and species as a fixed effect 

(E. fuscus, n = 202 observed groups, intercept: 0.020 + 0.11; M. lucifugus, 
n=1,011 observed groups, coefficient 0.450 + 0.08; M. septentrionalis, 


LETTER 


Myotis septentrionalis Perimyotis subflavus 


n= 151 observed groups, coefficient: —0.03 + 0.12; P. subflavus, n= 263 
observed groups, coefficient, —0.058 + 0.11). The lower and upper hinges 
of the box plot show the first and third quartiles. The upper and lower 
whiskers extend to the largest and smallest value 1.5 times the interquartile 
range. 
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Extended Data Fig. 6 | Distributions of shared-group sizes for four 
species of hibernating bats in early (November) and late (March) winter 
and during the collection of UVF-dust and fungal infection data. 

a, Across all species, there was no significant change in social-group size 
over the winter using season (early versus late winter) as a fixed effect, and 
site and species as random effects (early winter, intercept: 0.38 + 0.21; late 
winter, coefficient: 0.01 + 0.03, P=0.716, n= 1,956 bat groupings). We 
also investigated changes in social-group size for M. lucifugus, using an 
identical model but dropping species as a random effect. Again, we found 
no significant change in social group size over the winter (early winter, 
intercept: 0.84 + 0.17, n= 457; late winter, coefficient: —0.008 + 0.04, 


Oo Disease dynamics (WNS) oO UVF-dust 


Eptesicus fuscus Myotis lucifugus 


Myotis septentrionalis 


1:5: 2.0 2.5 3.0 1.00 1.25 1.50 A375. 2.00 
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n=528, P=0.84). Neither model was significantly better than a null- 
intercept model. For E. fuscus, shared-group sizes increased slightly 

over the winter (early winter, intercept: 0.11 + 0.11, n= 79; late winter, 
coefficient: 0.28 + 0.14, n= 100, P=0.040). b, We compared social-group 
size (n= 1,328) between the disease-arrival year and the dust-study year 
using a generalized linear mixed-effects model with a Poisson distribution 
and a log link. We found no significant difference in social-group sizes 
among years using study year as a categorical fixed effect (dust or disease) 
and site and species as random effects (disease study, intercept: 0.32 + 0.16; 
UVF-dust study, coefficient: —0.04 + 0.04, P=0.33). 
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Extended Data Fig. 7 | Environmental contamination and spatial 
overlap of UVF-dusted individuals. a, Total surface area of the 
hibernacula walls and ceilings with UVF dust originating from different 
species (n = 51 measurements). Small black points show the summed total 
surface area with each colour of dust. Large points show the mean and 


95% confidence intervals of the t 


November (generalized linear mixed-effects model with site as a random 
effect and species as a fixed effect; M. lucifugus, intercept: 2.71 £0.15; M. 


septentrionalis, coefficient: 0.144 


—0.21+0.18, t= —1.16; species effect, P= 0.30). b, Percentage overlap 
of hibernacula areas used by individual bats compared to areas used by 


Myotis septentrionalis Perimyotis subflavus 


hree species that were originally dusted in 


t 0.17, t=0.79; PB. subflavus, coefficient: 
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other individuals (n = 172 individual overlap estimates). The home range 
was calculated for each individual UVF-dust colour based on locations 
in the environment at which bats deposited their unique colour of 

UVF dust (see Methods). The points show an overlap estimate for each 
individual dusted bat compared to another dusted bat at that same site 
with larger points showing the mean and 95% confidence intervals for 
each species (generalized linear mixed-effects model with site as a random 
effect and species as a fixed effect; M. lucifugus, intercept: 0.88 + 0.05; 

M. septentrionalis, coefficient: —0.03 + 0.05, t= —0.50; P. subflavus, 
coefficient: —0.21 + 0.07, t= —2.86; species effect, P=0.02). Points are 
jittered in b to show overlapping data. 
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Extended Data Fig. 8 | The probability of an individual being re- 
sighted with UVF dust by group size. a, The probability of M. lucifugus 
(n= 1,697 individuals re-sighted for all dust colours) individuals having 
UVF dust that originated from another M. lucifugus increased with group 
size (generalized linear mixed-effects model with site and individual bat 

as random effects, and group size as a fixed effect; group size, coefficient: 
0.017 + 0.005; intercept, —0.596 + 0.201, P=0.0015). b, The probability of 
M. lucifugus (n= 4,164 individuals re-sighted for all dust colours) having 
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UVF dust that originated from dusted M. septentrionalis decreased with 


M. lucifugus group size (group size coefficient, —0.013 4 


t 0.004; intercept, 


—1.973+0.221, P=0.0003). This suggests that M. septentrionalis were 


more likely to contact M. lucifugus roosting solitarily th 


an in groups, 


whereas M. lucifugus roosting in groups were more likely to be contacted 
by other M. lucifugus than were M. lucifugus individuals that roosted 


solitarily. 
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Efferocytosis induces a novel SLC program to 
promote glucose uptake and lactate release 


Sho Morioka!*"', Justin S. A. Perry)?"!, Michael H. Raymond!’, Christopher B. Medina!*, Yunlu Zhu‘, Liyang Zhao”, 
Vlad Serbulea®, Suna Onengut-Gumuscu’, Norbert Leitinger®, Sarah Kucenas’, Jeffrey C. Rathmell®, Liza Makowski> & 


Kodi S. Ravichandran)?:!0* 


Development and routine tissue homeostasis require a 
high turnover of apoptotic cells. These cells are removed by 
professional and non-professional phagocytes via efferocytosis!. 
How a phagocyte maintains its homeostasis while coordinating 
corpse uptake, processing ingested materials and secreting anti- 
inflammatory mediators is incompletely understood”. Here, using 
RNA sequencing to characterize the transcriptional program of 
phagocytes actively engulfing apoptotic cells, we identify a genetic 
signature involving 33 members of the solute carrier (SLC) family 
of membrane transport proteins, in which expression is specifically 
modulated during efferocytosis, but not during antibody-mediated 
phagocytosis. We assessed the functional relevance of these SLCs in 
efferocytic phagocytes and observed a robust induction of an aerobic 
glycolysis program, initiated by SLC2A1-mediated glucose uptake, 
with concurrent suppression of the oxidative phosphorylation 
program. The different steps of phagocytosis*—that is, ‘smell’ 
(‘find-me’ signals or sensing factors released by apoptotic cells), 
‘taste’ (phagocyte-apoptotic cell contact) and ‘ingestion (corpse 
internalization)—activated distinct and overlapping sets of genes, 
including several SLC genes, to promote glycolysis. SLC16A1 was 
upregulated after corpse uptake, increasing the release of lactate, 
a natural by-product of aerobic glycolysis*. Whereas glycolysis 
within phagocytes contributed to actin polymerization and the 
continued uptake of corpses, lactate released via SLC16A1 promoted 
the establishment of an anti-inflammatory tissue environment. 
Collectively, these data reveal a SLC program that is activated during 
efferocytosis, identify a previously unknown reliance on aerobic 
glycolysis during apoptotic cell uptake and show that glycolytic by- 
products of efferocytosis can influence surrounding cells. 

To identify the pathways potentially involved in efferocytosis, we 
performed RNA sequencing of LR73 hamster phagocytes engulfing 
apoptotic human Jurkat cells (to clearly distinguish phagocyte-derived 
RNA; Fig. 1a, Extended Data Fig. 1). Efferocytic phagocytes displayed 
changes in multiple transcriptional programs, including decreased 
expression of pro-inflammatory genes, increased expression of actin 
rearrangement/cell motility genes and increased expression of anti- 
inflammatory genes, consistent with previous findings*” (Fig. 1a). In 
addition, we identified gene programs such as upregulation of glycoly- 
sis-associated genes and downregulation of genes required for oxidative 
phosphorylation (OXPHOS), fatty acid oxidation (FAO) and de novo 
cholesterol synthesis (Fig. 1a, Supplementary Table 1). 

We also noted extensive modulation of genes encoding solute carrier 
(SLC) proteins. SLCs are membrane proteins, localized in the plasma 
membrane, mitochondrial membrane and other internal membranes, 
which facilitate transport of molecules including sugars, nucleotides 
and amino acids across the membrane®*. Mutations in approximately 


100 of the 400 known SLCs are linked to human disease®®. In LR73 
phagocytes, expression levels of 33 SLCs (out of 165 detected) were 
modified during efferocytosis: 19 were upregulated and 14 were down- 
regulated (Extended Data Fig. 2, Supplementary Table 2). 

We categorized the 33 SLCs on the basis of association with phys- 
iological processes, experimentally or by homology (Extended Data 
Fig. 2b), and constructed an integrated network linking each SLC and 
its assigned functions with other SLCs modified during efferocytosis 
(Fig. 1b). All but two SLCs (Slc6a4 and Slc45a4) could be clustered into 
eight categories: carbohydrate metabolism; intracellular pH regulation; 
membrane stability and volume regulation; nucleoside salvage; vita- 
min transport; glycosylation; amino acid transport and catabolism; and 
OXPHOS and FAO. This analysis also reveals coordinated regulation 
of SLCs linked to particular physiological function(s): multiple SLCs 
associated with carbohydrate metabolism were upregulated, whereas 
SLCs associated with OXPHOS and FAO were downregulated (Fig. 1b). 
The changes in SLC expression were confirmed by quantitative reverse 
transcription with PCR (qRT-PCR) using hamster-specific primers 
(Extended Data Fig. 3a, b). Thus, efferocytosis induces a specific SLC 
signature that potentially modulates multiple physiological processes. 

Mouse peritoneal macrophages displayed similar modulation 
of SLCs during efferocytosis, suggesting that professional and non- 
professional phagocytes shared a similar response (Fig. 2a, Extended 
Data Fig. 3b and Supplementary Table 3). Notably, macrophages ingest- 
ing Jurkat cells coated with CD3 antibody (with comparable phagocy- 
tosis), did not show changes in the same SLCs, except for Slc29a2 and 
Slc29a3, which responded in the opposite direction to the efferocytosis 
response (Fig. 2a). Thus, in addition to the type of engulfed cell, the 
type of phagocytic receptor used influences the SLC program in phago- 
cytes. Furthermore, when apoptotic Jurkat cells were injected intra- 
peritoneally into mice, efferocytic CD11b'8*F 4/308" macrophages 
exhibited similar changes in SLC gene expression to those seen in the 
in vitro efferocytosis (Fig. 2b). 

In a time course of efferocytosis, expression of several SLCs was 
modulated early (0-4 h), whereas others were upregulated later (after 
4h). Even within a group of SLCs linked to a particular function, we 
observed early and late modulation of expression. These changes were 
not strict, and mRNA and protein concentrations exhibited continuous 
variation through the time course (Extended Data Fig. 4). 

Distinct stages or phases in efferocytosis have been identified!”, 
including (i) smell phase, in which apoptotic cells and phagocytes 
communicate via soluble mediators; (ii) taste phase, in which ligand- 
receptor interactions are established between apoptotic cells and phago- 
cytes; and (iii) ingestion phase, in which corpses are internalized and 
processed. To identify SLCs that are modulated during the different 
stages of efferocytosis, we performed and compared RNA-seq analyses 
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Fig. 1 | Transcriptional programs initiated during efferocytosis. 

a, Phagocytes regulate distinct transcriptional modules during 
efferocytosis. RNA-seq was performed on LR73 hamster fibroblasts that 
had been incubated with apoptotic human Jurkat cells. We detected 
changes in expression of 1,450 hamster genes. These genes were 
categorized according to primary function and sequence similarity. 


of LR73 cells treated only with apoptotic cell supernatants with those 
of phagocytosing LR73 cells treated with cytochalasin D, which allows 
corpse binding but not internalization (Fig. 2c). These data, using SLC 
induction as a readout, provide further evidence for ongoing commu- 
nication between apoptotic cells and phagocytes. 

Efferocytosis is an energy-intensive process, as it requires energy 
for dynamic actin rearrangements to engulf corpses that are often 
nearly as large as the phagocyte®. We focused on SLC2A1 (also 
known as GLUT1), a glucose transporter that facilitates uptake of 
glucose from the extracellular medium!®, as SLC2A1 was strongly 
upregulated in LR73 cells and macrophages early during efferocyto- 
sis (Fig. 2, Extended Data Fig. 4b). First, overexpressing SLC2A1 in 
LR73 cells increased efferocytosis (Fig. 3a). Second, treatment with 
STF-31, an inhibitor of SLC2A 1, reduced efferocytosis in wild-type and 
SLC2A 1-overexpressing LR73 cells (Fig. 3a). Third, small interfering 
RNA (siRNA) knockdown of Sic2a1 reduced corpse uptake (Fig. 3b); 
this effect was rescued by co-transfection with siRNA-resistant Slc2a1 
(Fig. 3b). Fourth, CRISPR-Cas9 deletion of Sic2a1 reduced efferocyto- 
sis in LR73 cells (Fig. 3c). Fifth, Slc2a1-deficient bone-marrow-derived 
macrophages (BMDMs) displayed decreased efferocytosis (Fig. 3d), 
but retained normal antibody-mediated phagocytosis (which does not 
modulate Slc2a1 expression) compared to untreated cells (Extended 
Data Fig. 5a). Deletion efficiencies for Slc2a1 are shown in Extended 
Data Fig. 5b-d. 

In vivo administration of STF-31'' before intraperitoneal injection 
of apoptotic Jurkat cells decreased efferocytosis by peritoneal mac- 
rophages (Fig. 3e), but not phagocytosis of CD3-antibody-coated Jurkat 
cells (Extended Data Fig. 5e). Moreover, STF-31 did not further reduce 
efferocytosis in Slc2a1-deficient LR73 cells or BMDMs (Extended Data 
Fig. 5f), indicating that the effects were specific to SLC2A1. We also 
tested corpse clearance in the thymus after induction of apoptosis by 
dexamethasone injection”, with or without co-injection of STF-31. 
There was a modest increase in uncleared or secondarily necrotic 
thymocytes in response to STF-31 alone, and a significant increase in 
necrotic thymocytes in response to treatment with both dexamethasone 
and STF-31 (Fig. 3f, Extended Data Fig. 5g). 

To complement the pharmacological approach, we used two genetic 
approaches. First, using zebrafish expressing GFP in macrophages 
(Tg(mpeg1:GFP)), we targeted the Slc2a1 orthologue via morpholino 
oligonucleotide injection’’. In control morpholino-treated embryos, 
GFP* macrophages exhibited numerous phagocytic puncta (containing 
and/or associated with neutral red), whereas slc2a1a morphant embryos 
had fewer such associations (Fig. 3g). Quantifying z-stack images, and 
focusing on macrophages in the trunk region, slc2ala morphants 
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per DESeq2 algorithm was <0.1. Data are from four independent 
experimental replicates. UPR, unfolded protein response. b, SLC genes 
that are differentially regulated during efferocytosis are represented 
using network analysis to determine family clusters (shaded areas) and 
connectedness between individual SLCs. 


displayed fewer neutral red+ macrophages and less neutral-red stain- 
ing per macrophage (Fig. 3g). This also suggested an evolutionarily 
conserved role for SLC2A1. Second, we tested the Sic2a1 requirement 
in a mouse model of atherosclerosis, in which defective apoptotic cell 
clearance can manifest as increased necrotic cores within plaques". 
Bone marrow transplantation was performed with cells from 
myeloid-targeted LysM-Cre Slc2a1" mice (myeloid differentia- 
tion was not affected) into atherosclerosis-prone Ldlr~/~ mice. After 
Western diet feeding (12 weeks), there was a significant increase of 
the necrotic core area in the aortic roots of mice deficient in Slc2a1 in 
the myeloid lineage (Fig. 3h). The number of TUNEL-positive nuclei 
(late apoptotic cells) also significantly increased within necrotic cores, 
implicating defective corpse clearance (Fig. 3h). Taken together, these 
results show that SLC2A1 contributes to engulfment of apoptotic cells 
both in vitro and in vivo. 

As SLC2A1 is a glucose transporter®’”, we investigated whether 
glucose uptake is important for efferocytosis. Switching LR73 cells to 
glucose-free medium at the initiation of efferocytosis reduced uptake 
of apoptotic cells (Fig. 4a); conversely, adding exogenous glucose 
increased uptake of apoptotic cells (Fig. 4a), a phenotype that was atten- 
uated by silencing Slc2a1 (Extended Data Fig. 6a). We also measured 
glucose uptake directly using the non-metabolizable glucose analogue 
2-deoxyglucose (2-DG); 2-DG uptake increased approximately three- 
fold during efferocytosis (Fig. 4b). Further, phagocytes pre-treated with 
2-DG showed decreased efferocytosis (Extended Data Fig. 6b). Thus, 
SLC2A1-mediated glucose uptake is an important step in efferocytosis. 

Seahorse analysis of LR73 phagocytes (Fig. 4c) or BMDMs (Extended 
Data Fig. 6c) showed increased aerobic glycolysis and decreased 
OXPHOS in engulfing phagocytes (Fig. 4c). Analysis of the RNA-seq 
data from engulfing phagocytes (Fig. 1a) revealed upregulation of mul- 
tiple glycolysis genes (Fig. 4d, Extended Data Fig. 6d), with concurrent 
downregulation of OXPHOS and FAO genes. siRNA-mediated knock- 
down of PDK1 (Fig. 4e) or PDK4 (Fig. 4f), which promote aerobic 
glycolysis’®, or treatment with the pan-PDK inhibitor dichloroacetate 
(Fig. 4g), resulted in decreased efferocytosis. Notably, when we com- 
pared BMDMs treated with dichloroacetate to block aerobic glycol- 
ysis with BMDMs treated with rotenone and antimycin A1 to block 
OXPHOS, we found that PDK inhibition reduced efferocytosis but not 
antibody-mediated phagocytosis, whereas OXPHOS inhibition reduced 
antibody-mediated phagocytosis, but not efferocytosis (Extended Data 
Fig. 7a). 

Phosphorylation of SLC2A1 by SGK1 increases the abundance of 
SLC2A1 at the plasma membrane’®. Sgk mRNA was upregulated dur- 
ing efferocytosis (Fig. 4d, Extended Data Fig. 6d), and targeting SGK1 


8,10 


29 NOVEMBER 2018 | VOL 563 | NATURE | 715 


© 2018 Springer Nature Limited. All rights reserved. 


LETTER 


a Engulfment 
Efferocytosis 


Smell (find-me signals) 
Sic26a6—_ SIc35a4 


(per cent CypHer5E*) 


p No AC 


o+AC 


Sic46a1_ = SIc7a7 


Sic25a45 Sic33a1 


54.3% 


Taste (ligand-receptor interactions) 


& SLCs 


BB 3108 10" 708 
D br of oe a & 
& a ae Nu, oe Mae o CypHer5E Sic2a1 


Antibody-mediated phagocytosis Sic20a1 


w25 100: 
Db 


Sicta4 


a Control 


620 “ 80: || algG2a- 
s 15 = 60. coated cells Ingestion (internalization) 
z ** 5 
21.0 & 40 44.6% Sici6a1_ Sic29a1 
= SLCs 
20. 
205 Sic6a6 Sic 15a3 
Eo 3 5 ITE 
Poe 2 Pro RANTS Sle35d2 Sic35f5 
ov san Poe So Oro CypHer5E 
b 
Apoptotic ee 
Jurkat cells i} we Isolate peritoneal cells 
; n after 2h , zs 
FACS analysis of CD11b+F4/80*+ pM 5 
105{ CD11bF 100 98.4 3 
2104, F/B @ _% 6.04 2 
560; | / ess 
8 403, fo} a z 
CD11b°™| © 40: & 
On F4/80" | 99, 
Adie eens 4 
Tio Fors — OO FON TOT 


AUN N A 
F4/80 CypHerSE Dr oS) oP 08" 90" eo 


N 
LP of er of 
x oe oe Pie oe oe SX 


Fig. 2 | Specific SLC signatures induced during different contexts of 
efferocytosis. a, The SLC signature during efferocytosis is distinct from 
that of antibody-mediated phagocytosis. Peritoneal macrophages were 
incubated with apoptotic Jurkat cells (AC) or Jurkat cells coated with IgG 
antibody against CD3, and SLC gene expression was measured by qRT- 
PCR. Upregulated (green), downregulated (red) and unchanged (white) 
SLC expression is shown (left). CypHer5E fluorescence in macrophages 
engulfing targets (right). Two independent experiments with 3-4 replicates 
per condition. b, Modulation of SLC expression in efferocytic peritoneal 
macrophages in vivo. Flow cytometric profiles of CD11b™8"F4/80'' and 
CD11b!“F4/80'” macrophages, and engulfing peritoneal macrophages 
(CypHer5e*) (bottom left). Quantification of SLC expression in 

the isolated CD11b'8' fraction by dgT-PCR using mouse-specific 
primers (right). In b, right panel, the grey bar for Slc29a2 signifies that 
there was no increase in expression of this SLC in the presence of apoptotic 
cells, relative to the control condition. Data represent two replicates with 6 
mice per group per experiment. c, Specific SLC signature during different 
stages of efferocytosis. RNA-seq was performed using mRNA from LR73 
cells treated for 4 h with supernatants of apoptotic cells, or LR73 cells 
treated with cytochalasin D and incubated with apoptotic cells. SLC genes 
altered by supernatant alone (smell), and cytochalasin D-sensitive SLCs 
(ingestion) were used to identify SLCs responding to ligand-receptor 
interactions (taste) (red, upregulated; blue, downregulated). For clarity, 
SLCs in more than one stage are not shown. In all figures, *P < 0.05, 
**P < 0.01, ***P< 0.001. 


with siRNA or the SGK1 inhibitor GSK650394 decreased uptake of 
apoptotic cells (Fig. 4h, Extended Data Fig. 7b). Further, efferocytic 
BMDMs from transgenic mice expressing an extracellular Myc tag on 
SLC2A1!° exhibited increased cell-surface expression of Myc-SLC2A1; 
this effect was also inhibited by GSK650394 (Extended Data Fig. 7c). 

Phagocytes frequently ingest multiple apoptotic corpses sequen- 
tially’. To address whether SLC2A1 is required for the uptake of the 
first corpse, or for continued uptake of further corpses, we treated 
phagocytes with inhibitors for SLC2A1 or SGK1 at different times dur- 
ing efferocytosis. SLC2A1 function was required for uptake of the first 
corpse as well as for continued uptake of additional corpses (Extended 
Data Fig. 7d). 

Corpse internalization requires substantial actin polymerization, 
which has been linked to aerobic glycolysis during cell migration!” 
Increased actin polymerization in efferocytic phagocytes (indicated by 
phalloidin staining) was inhibited by treatment with either STF-31 or 
2-DG (Fig. 4i). Inhibiting PDK1, which favours aerobic glycolysis, also 
reduced F-actin formation (Fig. 4i). Thus, SLC2A1-mediated glucose 
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Fig. 3 | SLC2A1 promotes glucose uptake and efferocytosis. 

a, Overexpression of SLC2A1 in BMDMs increases efferocytosis. 
Treatment with the SLC2A1 inhibitor STF-31 reduces phagocytosis and 
abolishes the effect of SLC2A1 overexpression (OE). b, Slc2a1 siRNA 
inhibits efferocytosis; this effect is rescued by co-transfection with siRNA- 
resistant Slc2al cDNA. ¢, SLC2A1 knockout by CRISPR-Cas9 causes a 
decrease in phagocytosis of apoptotic cells in LR73 cells. WT, wild type. 

d, SLC2A1 deletion in BMDMs from Sic2al“" mice using TAT-Cre inhibits 
phagocytosis. Phagocytosis index = per cent engulfment (experimental/ 
control). Data from > 2 independent experiments with 3-4 replicates 

per condition. e, Treatment with the SLC2A1 inhibitor STF-31 reduces 
efferocytosis in vivo. f, STF-31 promotes accumulation of necrotic 
thymocytes after dexamethasone-induced apoptosis in vivo. Data 
represent two independent experiments with 3-4 mice per group. 

g, Targeting the Slc2a1 orthologue reduces efferocytosis in zebrafish. 
Tg(mpeg1:GFP) embryos were injected with control or slc2ala 
morpholino. Neutral red was used to preferentially stain acidic organelles. 
slc2a1a-targeted morphants (50 h post fertilization (hpf)) displayed less 
apoptotic cell engulfment (neutral red*+ GFP-labelled macrophages) in the 
trunk region. Three areas and three fish per group. Data are mean + s.d. 
Scale bar, 50 jum. h, Increased necrotic atherosclerotic area and TUNEL* 
cells after myeloid-specific deletion of Slc2a1. Top, schematic of bone 
marrow chimaeras using Slc2a1“" and LysM-Cre Slc2a1!" mice. Middle, 
serial interrupted 5-j1m sections stained with Masson's Trichrome. 
Representative photomicrographs and quantification of necrotic core area 
normalized to total area. Bottom, TUNEL staining and quantification of 
TUNELT cells per necrotic core. Data are mean +s.e.m. of 7-8 mice per 
group . Scale bar, 200 jum. 


uptake and glucose utilization in aerobic glycolysis contribute to actin 
polymerization during efferocytosis. 

Distinct steps of SLC2A1-dependent aerobic glycolysis in phagocytes 
were regulated by the smell, taste and ingestion phases of efferocyto- 
sis (Fig. 5a—c). Apoptotic supernatant was sufficient to increase Sgk1 
expression, but not that of Slc2a1 (Fig. 5a). ATP (a known find-me sig- 
nal)!” also increased Sgk1 expression (Extended Data Fig. 8a). Similarly, 
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Fig. 4 | Glycolytic pathway intermediates facilitate efferocytosis. 

a, Glucose in medium influences efferocytosis. No glucose, grey; 
physiological glucose (1 mg m7’), black; high glucose (5 mg ml~'), 
orange. Glucose was added to medium simultaneously with apoptotic 
cells. Data from at least 3 independent experiments with 2-3 replicates 
per condition. Note that this differs from long-term glucose-free pre- 
treatment of phagocytes’. b, Increased glucose uptake via SLC2A1 during 
efferocytosis. LR73 cells were co-cultured with apoptotic Jurkat cells for 
2 h, washed and 2-DG uptake was measured. Data from 2 independent 
experiments with 3 replicates each. c, Increased glycolysis in phagocytes 
during apoptotic cell clearance. Glycolysis and OXPHOS were measured 
during efferocytosis (with Seahorse XF) using extracellular acidification 
rate (ECAR) and oxygen consumption rate (OCR). Data are mean + s.d. 
of two replicates per condition from two independent experiments. 

d, Heat map showing upregulation of glycolysis-associated genes during 
efferocytosis. e-h, Effect of siRNA targeting of Pdk1 (e), Pdk4 (f) or Sgk1 
(h), or dichloroacetate, a pan-PDK inhibitor (g), on efferocytosis in LR73 
cells. Data represent three independent experiments with 3-4 replicates 
per condition. i, F-actin formation during efferocytosis. LR73 phagocytes 
were co-incubated with CFSE-labelled apoptotic thymocytes for 30 min, 
stained with phalloidin and actin polymerization was measured using by 
flow cytometry. PDK1 inhibitor experiments were performed separately. 
MFI, mean fluorescence intensity; rel., relative. Data represent at least 3 
independent experiments with 3-4 replicates. 


binding of apoptotic targets or phosphatidylserine (PtdSer) liposomes 
to phagocytes (without internalization) was sufficient to induce Slc2a1 
(Fig. 5b). Masking PtdSer on targets (using BAI1-TSR, the PtdSer- 
binding domains of BAI1) reduced upregulation of Slc2a1 (but not 
that of Sgk1) during efferocytosis (Extended Data Fig. 8b). Therefore, 
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Fig. 5 | SLC16A1-mediated lactate release promotes an anti- 
inflammatory environment. a-c, Expression of Slc2a1, Slc16a1 and 
Sgk1 during distinct steps of efferocytosis, determined by qRT-PCR. 
mRNA induction by supernatant of apoptotic cells (a), PtdSer-containing 
liposomes (b) or apoptotic cells with or without cytochalasin D (1 (4M) 
for 4 h (c), which were added to LR73 cell cultures. d, Lactate release from 
efferocytic phagocytes. LR73 cells treated with control or Slcl16al siRNA 
were incubated with apoptotic cells, washed, incubated for an additional 
4h, and lactate was measured in the cells and in the medium. Data in 
a-d represent 2 independent experiments with 3-4 replicates per 
condition. e, f, Supernatant from LR73 phagocytes (e) was added to 
BMDMs, and the effect on expression of anti-inflammatory or M2-like 
marker genes was analysed by qRT-PCR (f). Data represent 2 independent 
experiments with 2-3 replicates per condition. g, Schematic showing the 
upregulation and function of the SLC2A1-SGK1-SLC16A1 axis during 
efferocytosis. 


SGK1—triggered by factors released from apoptotic cells—prepares 
the phagocytes by increasing abundance of endogenous SLC2A1 on 
the plasma membrane, and PtdSer-dependent interactions increase 
new transcription and expression of SLC2A1. 

Following corpse internalization in phagocytes, there was increased 
expression of Slc16a1, a plasma membrane proton-linked monocarbox- 
ylate transporter of lactate and pyruvate!® (Figs. 1b, 2a-c, 5c, Extended 
Data Fig. 8b). siRNA-mediated knockdown of Slc16a1 in LR73 cells 
reduced efferocytosis in vitro (Extended Data Fig. 8c), and SR13800—a 
bioactive inhibitor of SLC16A1—reduced apoptotic cell uptake by peri- 
toneal macrophages in vivo (Extended Data Fig. 8d). Supernatants of 
engulfing LR73 cells contained threefold-higher lactate concentration 
(approximately 5 mM) compared to supernatants of phagocytes with- 
out apoptotic cells (approximately 1.5 mM) (Fig. 5d). siRNA knock- 
down of Slc16a1 reduced lactate concentration in the supernatants to 
approximately 2 mM, with concomitant accumulation in phagocytes 
(Fig. 5d), indicating that SLC16A1 contributes to lactate release from 
engulfing phagocytes. 

Lactate released from solid tumours could act on naive macrophages 
to induce M2 macrophage-like polarization’. To test whether factors 
released via SLC16A1 during efferocytosis might also promote anti- 
inflammatory skewing of naive macrophages”, we tested the effect of 
supernatants of engulfing LR73 phagocytes on BMDMs (Fig. 5e). These 
supernatants induced upregulation of anti-inflammatory macrophage 
genes such as Tgfb1 and J/10 as well as anti-inflammatory or M2-like 
markers, including Vegfa, Mgl1, Mgl2 and CD206 (also known as Mrc1), 
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whereas pro-inflammatory markers (Tnf and 116) were not affected. 
Slc16a1 knockdown attenuated this effect in engulfing LR73 cells 
(Fig. 5f, Extended Data Fig. 8e, f). Therefore, aerobic glycolysis induced 
during efferocytosis affects efferocytosis at two levels: regulation of 
corpse uptake through regulation of actin polymerization (involving 
SLC2A1); and modulation of expression of anti-inflammatory genes 
in neighbouring cells (via SLC16A1). 

The results presented here provide several key insights. Although 
the SLC family is the second largest among membrane proteins (after 
G-protein-coupled receptors), there is much less knowledge of SLC 
function in specific physiological contexts®. This work describes coor- 
dinated regulation of select SLCs during efferocytosis (Fig. 5g). The 
smell, touch and ingestion phases of efferocytosis induce distinct and 
overlapping sets of SLC genes with functional consequences; these 
sets of genes are distinct from those that are activated during anti- 
body-mediated phagocytosis. Efferocytosis induces a metabolic gene 
program promoting glucose uptake and subsequent glycolysis, with 
concurrent downregulation of genes linked to OXPHOS and FAO. 
Although efferocytic macrophages are more M2-like”®, and M2-like 
macrophages are reported to be OXPHOS-dependent”!, our study of 
the first few hours of efferocytosis does not rule out a role for OXPHOS 
at later times. Although glycolysis is linked to inflammation, effero- 
cytic phagocytes can influence non-engulfing naive macrophages in 
the tissue microenvironment towards anti-inflammatory polarization 
by SLC16A1-mediated lactate release, as well as other factors?” released 
during efferocytosis such as TGF@ and IL-10. 


Online content 

Any methods, additional references, Nature Research reporting summaries, source 
data, statements of data availability and associated accession codes are available at 
https://doi.org/10.1038/s41586-018-0735-5. 
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METHODS 


The investigators were not blinded to allocation during experiments and outcome 
assessment. 

In vitro engulfment assay. For induction of apoptosis””, human Jurkat T cells 
resuspended in RPMI with 1% BSA were treated with 150 mJ/cm? ultraviolet C 
irradiation (Stratalinker) and incubated for 4 h at 37°C with 5% COb. For antibody- 
dependent phagocytosis, Jurkat cells were labelled with CD3 antibody (25 g/ml; 
clone OKT3, BioLegend) along with annexin V recombinant protein (to block 
efferocytosis of any residual dying cells) (3 j1g/ml; eBioscience) for 1 h at 4°C and 
the cells”? were then stained with CypHer5E (GE Healthcare, PA15401) or TAMRA 
(Invitrogen, C-1171) before use in the engulfment assays. Chinese hamster LR73 
cells or mouse macrophages were seeded in a 24-well plate and incubated with 
targets at a 1:10 phagocyte:target ratio for the indicated times. Targets were then 
washed with PBS. Where indicated, phagocytes were rested in culture medium for 
an additional period of time. Cells were dissociated from the plate with trypsin 
and the phagocytes were assessed by a flow cytometry-based assay or analysed 
for RNA or protein”. Phalloidin staining was conducted according to the man- 
ufacturer’s instruction (Invitrogen). When primary macrophages were used as 
phagocytes, there was an inherent difference in the absolute percentage uptake of 
corpses between experiments performed on different days. Therefore, phagocytic 
index was used to better compile data from multiple experiments. All cell lines 
(except LR73 cells) were obtained from ATCC. LR73 cells were obtained from 
C. Stanners”’. Cells we routinely tested for mycoplasma contamination and all cell 
lines tested negative. 

RNA sequencing. LR73 cells were co-cultured with apoptotic Jurkat cells for 2 h, 
unbound Jurkat cells were removed by washing with PBS and the phagocytes were 
rested in culture medium for an additional 2 h. Total RNA was extracted, and 
an mRNA library was prepared using the Illumina TruSeq platform, followed by 
sequencing using an Illumina NextSeq 500 cartridge. Four independent experi- 
ments were sequenced. R v.3.2.2 was used for graphical and statistical analysis and 
the R package DESeq2 was used for count normalization and differential gene 
expression analysis of RNA-seq data. All genes were curated using a combina- 
tion of literature mining and function determination (known or predicted) via 
Uniprot. Genes involved in multiple associated functions (for example, kinases or 
cell-cycle-related) were excluded. We generated functional classifications of solute 
carrier (SLC) proteins on the basis of several criteria: (1) a known function was 
ascribed to the channel directly (for example, SLC2A1 is required for glycolysis); a 
known function was ascribed to the solute transported by the channel (for example, 
SLC2A1 transports glucose, glucose is required for glycolysis, therefore SLC2A1 is 
required for glycolysis); or by predictive homology (for example, SLC2A1 trans- 
ports glucose and is required for glycolysis; as SLC2A4 is homologous to SLC2A1, 
SLC2A4 may therefore be important for glycolysis). Several SLCs have been shown 
or are predicted to perform multiple functions. These alternative functions are 
listed in Supplementary Table 3. Network analysis was performed using the net- 
work analytical software Gephi v.0.9.1 (https://gephi.org/). Standard algorithms for 
calculating clusters were used as implemented in Gephi. The edges and community 
links were calculated using the ‘link communities’ plug-in based on the algorithms 
proposed for biological network analyses”*. Network structure was determined 
using the Fruchterman-Reingold force-directed layout algorithm. All code used 
for analysis is available upon request. 

qRT-PCR. Total RNA was extracted from cells using the RNeasy Mini Kit (Qiagen) 
and cDNA was synthesized using QuantiTect Reverse Transcription Kit (Qiagen), 
according to the manufacturers’ instructions. Quantitative gene expression analysis 
for hamster and mouse SLCs was performed using hamster- and mouse-sequence 
specific Taqman probes that are non-cross reactive with human sequences (Applied 
Biosystems), run on a StepOnePlus Real Time PCR System (Applied Biosystems). 
Details of Taqman primers are listed in Supplementary Information. 

In vivo engulfment assay and qRT-PCR. Six million apoptotic Jurkat cells, stained 
with CypHer5E, or as control X-VIVO 10 medium alone, were intraperitoneally 
injected (300 il per mouse). At indicated times post-injection, mice were euth- 
anized, and the peritoneal lavage was collected in 10 ml PBS + 10% fetal bovine 
serum (FBS). The collected cells were stained with CD11b PE-Cy7 (eBioscience, 
Cat#: 25-0112-82) and F4/80 APC-eFluor 780 (eBioscience, Cat#: 47-4801-80), 
and the uptake of the injected CypHer5E* apoptotic cells by CD11b*F4/80's" 
cells was assessed by flow cytometry. For sequencing or other types of analysis of 
the responses of macrophages, the peritoneal macrophages were isolated using 
Macrophage Isolation Kit (Miltenyi Biotec). To test the effect of drugs targeting 
SLC2A1 and SLC16A1 on in vivo engulfment, mice were treated with STF-31 
(Santa Cruz #sc-364692, 10mg/kg) or SR13800 (EMD Millipore # 509663, 
10mg/kg) for 1 h before administration of apoptotic cells or IgG-coated cells. For 
qRT-PCR, the cells collected were lysed for RNA isolation. 

In vivo thymus efferocytosis assay. Six- to eight-week-old mice were injected 
intraperioneally with 300 il PBS containing 250 j1g dexamethasone (Sigma) with 
or without STF-31 (10 mg/kg) dissolved in EtOH. Four hours after injection, thymi 
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were collected from mice, and the numbers of thymocytes with annexin V staining 
only (apoptotic) versus annexin V, 7-aminoactinomycin D (7-AAD) double- 
positive cells (secondarily necrotic) were determined by flow cytometry. 

In vivo bone marrow transplantation and analysis of necrotic cores in athero- 
sclerosis. At six weeks of age, Ldir~/~ mice received two doses of X-ray irradiation 
(500 cGy x 2, 4h apart; X-RAD), and were then transplanted with bone marrows 
isolated from Slc2a1/!" or LysM-cre/Slc2a1/“' donor mice. Control animals were 
transplanted with HBSS buffer only—these mice died within 10-11 days of lethal 
irradiation. Following bone marrow transplantation, chimeric Ldlr~’~ mice were 
transferred to sterile cages with ad libitum access to sterile mouse chow and sterile 
water and were maintained on chow diet for four weeks before challenge with 
Western Diet (Harlan Teklad TD88137, 42% of Kcal from milk fat with 0.15% 
added cholesterol) for 12 weeks. Following mice dissection, hearts were isolated 
for formalin-fixed paraffin-embedded sections. The sections were stained with 
Masson's trichrome for quantification of necrotic core areas and normalized to 
total quantified area. Necrotic core size was measured using Aperio ePathology 
software (Aperio). TUNEL staining was conducted according to the manufacturer's 
instructions (Promega). 

In vivo phagocytosis analysis in zebrafish. To inhibit the expression of slc2al, 
antisense morpholinos targeting the translational start site of the zebrafish slc2ala 
(5’-GGCCATCATCAGCTGAGGAGTCACC-3’) were synthesized (Gene Tools). 
A control morpholino (5’GGCCATCATCAGCTGAGGAGTCACC-3’) was used 
as anegative control. Morpholino (2.5 ng) was microinjected into Tg(mpeg1:GFP) 
embryos at the one-cell stage. Embryos were treated with phenylthiourea (0.004%) 
in egg water at 24 hpf to reduce pigmentation as per standard protocols. 8-10 h 
before imaging, embryos were soaked in 2.5 g/ml neutral red (Sigma) in egg 
water. At 50 hpf, morphants were anaesthetized with 3-aminobenzoic acid ester 
(Tricaine), immersed in 0.8% low-melting-point agarose and mounted laterally in 
glass-bottomed 35-mm Petri dishes for confocal imaging. Three identical z-stack 
images were taken for each embryo covering hemi-segments of somites 6 to 20. 
More than 100 macrophages were counted per group for the evaluation of GFP 
and neutral red colocalization. 

CRISPR-Cas9 deletion or siRNA knockdown of SLCs. Stable, individual clones 
of Cas9-GFP-expressing LR73 cells were generated by lentiCas9-EGFP plasmid 
via lentiviral transduction and a protocol adapted from ref. *°, followed by single- 
cell cloning of GFP-expressing cells and Cas9 expression verification. Slc2a1 
was deleted from LR73 cells using two independent Cas9-GFP LR73 cell clones 
and using lentiGuide-Puro sgRNA plasmid with two unique guides for Slc2a1. 
LentiCas9-EGFP was a gift from P. Sharp and F. Zhang (Addgene plasmid # 63592) 
and lentiGuide-Puro was a gift from F. Zhang (Addgene plasmid # 52963). 

Guide RNAs targeting Slc2a1 were generated using the following oligonucle- 
otide pairs. Guide 1: 5’ CACCGATTCTTCCGGACATCATCGC-3’, 3/-CTAA 
GAAGGCCTGTAGTAGCGCAAA-5’; guide 2: 5/-CACCGTTCGGCC 
TGGACTCCATTA-3’, 3/-CAAGCCGGACCTGAGGTAATCAAA-5’. 

For siRNA and plasmid transduction experiments, LR73 cells were treated 
with Lipofectamine 2000 (Thermo Fisher) with specific siRNAs, according to the 
manufacturer's instructions, 2 days before the engulfment assay. GLUT1-eGFP/ 
pcDNA-DEST47 was a gift from W. Frommer (Addgene plasmid #18729)”. siR- 
NAs targeting hamster mRNAs were customized by GE Healthcare Dharmacon. 

Primers for siRNA against Slc2a1: 5‘-CCAAGAGUGUGCUGAAGAAUU-3’. 

Two siRNAs against Sgk1: 5‘-CUUCUAUGCUGCUGAAAUAUU-3’, 5’/-CU 
GCAGAAGGACAGGACAAUU-3’. Two siRNAs against Pdk1: 5'-CGA 
CAAGAGUUGCCUGUUAUU-3’, 5’-GGACAAAAGUGCUGAAGAUUU-3’. 
Two siRNAs against Pdk4: 5‘/-UCACACAAGUAAAUGGAAAUU-3’, 
5/-CAUCAAAGUUCGAAACAGAUU-3". siRNA against Slc16a1: 5'/-AGAAA 
CAGGAAGAAGGUAAUU-3’. 
Macrophage isolation and analysis. To obtain BMDMs, femurs from control mice, 
mice carrying floxed alleles of Slc2a1, or Glut1-Myc knock-in mice were removed 
and flushed with 5 ml sterile PBS containing 5% FBS”*”°. The cell suspension was 
centrifuged, treated with red blood cell lysis buffer, washed and then plated onto 
sterile Petri dishes in DMEM containing 10% L929 medium, 10% FBS and 1% 
penicillin-streptomycin-glutamine. Medium was replenished every 2-3 days and 
differentiated cells were used at day 6 post-collection. To delete Slc2a1, macrophage 
cultures were treated with TAT-Cre (EMD Millipore), according to the manufac- 
turer’s instructions. For staining, BMDMs were stained with F4/80 APC-eFluor 
780 (eBioscience, Cat#: 47-4801-80) and subsequently stained with Myc PerCP 
antibody (Novus Biologicals, 9E10 Cat#: NB600-302PCP) or fixed and permea- 
bilized using FoxP3/Transcription Factor Staining Buffer Set (eBioscience), and 
intracellular staining was performed using CD206 PE (BioLegend, Cat#: 141706). 
Resident peritoneal macrophages were obtained by flushing the peritoneal cavity 
of mice with 10 ml cold PBS containing 5% FBS. Collected cells were spun down, 
resuspended in X-VIVO 10 (Lonza), and plated at a concentration of 5 x 10° cells 
per well. Floating cells were removed the next day, and remaining peritoneal mac- 
rophages were used 2 days after isolation. 
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Glucose uptake assay. LR73 cells were incubated with apoptotic Jurkat cells for 2 h, 
washed 3 times with PBS and incubated with 10 mM 2-DG, a glucose analogue, in 
glucose-free medium for 30 min. Following incubation, cells were washed 3 times 
with PBS and lysed with Extraction Buffer (Sigma Cat#: MAK083). Lysate was 
frozen-thawed in dry ice/ethanol, and then heated at 85°C for 40 min. Lysate was 
then cooled on ice for 5 min and then neutralized by Neutralization Buffer (Sigma 
Cat#: MAK083). Samples were spun down at 13,000g to remove insoluble fraction 
and then diluted tenfold by adding Assay Buffer (Sigma Cat#: MAK083). Using the 
lysate, glucose uptake was measured using Glucose Uptake Colorimetric Assay Kit 
(Sigma). 2-DG is taken up by cells and phosphorylated by hexokinase to 2-DG6P. 
2-DG6P cannot be further metabolized and accumulates in cells, directly pro- 
portional to the glucose uptake by cells. 2-DG uptake is determined by a coupled 
enzymatic assay in which the 2-DG6P is oxidized, resulting in the generation of 
NADPH, which is then determined by a recycling amplification reaction in which 
the NADPH is used by glutathione reductase in a coupled enzymatic reaction that 
produces glutathione. Glutathione reacts with DTNB to produce TNB, which was 
detected at 412 nm as per the manufacturer’s recommendations. 

Seahorse analysis. LR73 cells or BMDMs were seeded into a Seahorse 24-well 
tissue culture plate (Agilent Technologies). The cells adhered overnight before 
treatment. For assessing respiratory capacity, cells were subjected to a mitochon- 
drial stress test. In brief, at the beginning of the assay, the medium was changed 
to DMEM containing pyruvate (Thermo Fisher Cat#:12800017, pH =7.35 at 
37°C) and cells were allowed to equilibrate for 30 min. OCR was measured using 
a Seahorse XF24 Flux Analyzer (Agilent Technologies). After three basal OCR 
measurements, the drugs of interest were injected into the plate and OCR was 
measured using four-minute measurement periods. Compounds to modulate 
cellular respiratory function (1 |1M oligomycin (Sigma-Aldrich); 2 1M BAM15 
(Cayman Chemical Company); 1 tM antimycin A and 100 nM rotenone (Sigma- 
Aldrich)) were injected after every three measurements. Basal respiration was cal- 
culated by subtracting the average of the first three measurements by the average of 
the post-antimycin A and rotenone measurements. Maximum respiratory capacity 
was calculated by subtracting the average of the post-BAM15 measurements by 
the average of the post- antimycin A and rotenone measurements. The reserve 
capacity was calculated by subtracting the average of the basal measurements from 
the average of the post-BAM15 measurements. 

For assessing glycolytic capacity, the cells were subjected to a glycolytic stress 
test. In brief, ECAR—a measurement of lactate export—was measured using a 
Seahorse XF24 Flux Analyzer. Cells were seeded into a Seahorse 24-well tissue 
culture plate. At the beginning of the assay, the medium was changed to unbuffered, 
glucose-free DMEM (Sigma-Aldrich Cat# D5030, pH 7.35 at 37°C), supplemented 
with 143 mM NaCl and 2 mM glutamine. After three basal ECAR measurements, 
the drugs of interest were injected into the plate and ECAR was measured using 
3-min measurement periods. Compounds to modulate glycolysis (20 mM glucose; 
1M oligomycin; 80 mM 2-DG) (Sigma) were injected after every three measure- 
ments. Basal glycolysis was calculated by subtracting the average of the post-2-DG 
measurements from the average of the post-glucose measurements. Maximum 
glycolytic capacity was calculated by subtracting the average of the post-2-DG 
measurements from the average of the post-oligomycin measurements. The 
glycolytic reserve capacity was calculated by subtracting the average of the post- 
oligomycin measurements from the average of the post-glucose measurements. 
Liposome construction. Liposomes were prepared by dissolving the lipids (phos- 
phatidylserine, dioleoyl phosphatidylcholine, cholesterol and the lipid DiD dye) 
in chloroform, evaporating chloroform under flow of argon gas in a glass vial and 
subjecting the lipid layer to overnight lyophilization to remove traces of organic 


solvent. Normal saline was then added for hydration, and after vortexing was done 
to prepare multilamellar vesicles. Particle size was verified by dynamic light scat- 
tering using Nicomp 370. 

Determination of lactate concentration. The lactate concentration was measured 
using a Lactate Assay Kit (Sigma) according to the manufacturer's instructions. 
The mean values + s.d. of the lactate concentration in the medium and cells were 
calculated for each condition. 

Immunoblotting. LR73 cells were seeded in a 100-mm dish at a concentration 
of 2 million cells per dish. Apoptotic Jurkat cells were added as indicated. Cells 
were lysed in RIPA buffer and immunoblotted with SLC2A1 (Abcam #ab652), 
SLC16A1 (LSBio LS-C335287), SLC12A2 (Cell Signaling Technology #14581) and 
total Erk2 (Santa Cruz Biotechnology, #sc-154-G) antibodies in Can Get Signal 
solution (TOYOBO Cat# NKB-101) followed by chemiluminescence detection. 
Specific bands were quantified using Adobe Photoshop CS6. 

Research animals. Power of 80% was used to determine the number of mice 
needed to achieve a two-sided 5% significance level to detect a twofold change for 
each set of in vivo studies. Allocation of mice was random in all in vivo experi- 
ments. Mice were taken from littermates. Animal breeding and experiments were 
performed in a specific pathogen-free animal facility using protocols approved 
by th University of Virginia Animal Studies Committee. Ethical guidelines deter- 
mined by the Institutional Animal Care and Use Committee were followed in all 
experiments performed in this manuscript. 

Statistical analysis. Statistical significance was determined using GraphPad 
Prism 7, using unpaired Student's two-tailed t-test, one-way ANOVA or two-way 
ANOVA, according to test requirements. Grubbs’ outlier test was used to deter- 
mine outliers, which were excluded from final analysis. *P < 0.05, **P<0.01, 
P< 0,001 were considered significant. Number of replicates and repeats of indi- 
vidual experiments and statistical tests used are shown in Supplementary Table 4. 
Code availability. All code used in this manuscript can be accessed from the 
Github repository https://github.com/perryjs/Perry-R. 

Reporting Summary. Further information on research design is available in 
the Nature Research Reporting Summary linked to this paper. 


Data availability 
RNA sequencing data presented in this study have been deposited in the NCBI 
GEO repository under the accession number GSE119273. 
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experiments performed for RNA-seq (2 h with apoptotic cells followed by 
2 h rest in the absence of apoptotic cells). b, Principal component analysis 
on hamster-genome-aligned RNA-seq data as a quality control statistic. 


Extended Data Fig. 1 | RNA preparation for RNA-seq experiments. 

a, Representative fluorescence-activated cell sorting plots of engulfment 
assays with LR73 hamster phagocytes (left) and annexin V/7-AAD 
staining of apoptotic human Jurkat cells (right) in conditions matching 
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Extended Data Fig. 2 | Regulation of SLC expression during 
efferocytosis. a, SLC genes are differentially regulated during 
efferocytosis. Left, plot of the 165 SLC genes detected by RNA-seq of 
efferoytic LR73 cells, highlighting the 19 significantly upregulated 

(red) and 14 downregulated (blue) SLC genes that were altered during 
efferocytosis. The 132 SLC genes that were not altered are located on the 
midline (black). Right, the current genetic classifications of these 33 SLC 
genes that are altered during engulfment are shown. b, Efferocytosis- 


associated SLCs and their properties. Current genetic classification and/ 
or functional linkages of the 33 SLCs modulated during apoptotic cell 
engulfment. The significantly upregulated and downregulated SLCs and 
transport grouped by predicted general 
function are shown, as are the known monogenic diseases and single 
nucleotide polymorphism (SNP) or disease phenotype to which the 


the substrates they are known to 


specific SLCs have been linked. 
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Extended Data Fig. 3 | RT-PCR confirmation of the RNA-seq data. 
a, qRT-PCR of mRNA of specific SLCs during efferocytosis. Indicated 
SLC genes were tested for mRNA expression levels during engulfment 
assays performed similarly to those in Fig. 1a. Data are representative of 


at least two independent experiments with 3-4 replicates per condition. 
b, The table presents the cycle numbers for each species-specific qRT- 
PCR primer. None of these primers produced signals when tested against 
human Jurkat cell mRNA (target) alone. 
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Extended Data Fig. 4 | See next page for caption. 
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Extended Data Fig. 4 | Dynamic expression of SLCs during 
efferocytosis. a, Schematic of the experiment and time points when RNA 
from phagocytes was assessed for specific SLC gene expression. Apoptotic 
Jurkat cells were added to LR73 cells and co-cultured for 2 h. Unbound 
and floating apoptotic cells were then washed away, and the LR73 cells 
were cultured in fresh medium for the indicated times. The time scale 

bar reflects total time of experiment, such that the 4-h time point reflects 
2. h with apoptotic cells plus 2 h subsequent incubation (to match the 
timeframe used in our RNA-seq experiment). Total RNA was subsequently 
isolated and qRT-PCR was performed for specific SLC genes. Flow 
cytometry plots indicate that fluorescent signals from the internalized 
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corpses are significantly degraded by the 8 h time point. b, Expression 

of SLC genes is regulated over the time course of efferocytosis. Relative 
expression of mRNAs for specific SLC genes belonging to different 
functional classes over the time course of engulfment is shown. Data are 
representative of three biological replicates. c, Immunoblotting for some 
of the SLCs modified during efferocytosis. Indicated SLCs were probed at 
various time points after addition of apoptotic cells. Relative intensities 

of specific bands, normalized to ERK2, are shown below representative 
blots. d, Immunoblotting for the some of the SLCs in LR73 phagocytes and 
apoptotic Jurkat cells. 
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Extended Data Fig. 5 | See next page for caption. 
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Extended Data Fig. 5 | The role of SLC2A1 in efferocytosis. a, Slc2ai/! 
BMDMs were treated with or without TAT-Cre to delete Slc2a1. The 

cells were then incubated with IgG-coated Jurkat cells and engulfment 
was assessed by CypHer5E signal within BMDMs. The uptake by control 
BMDMs (not treated with TAT-Cre, and denoted wild type (WT)) was 
set to 1. b, siRNA targeting of Slc2a1 downregulates SLC2A1 protein 
expression. Representative western blots from siRNA knockdown of 
Slc2a1 versus scrambled siRNA in LR73 cells are shown. LR73 cells 
expressing siRNA-resistant SLC2A1 are also shown. c, Sic2a1 deletion 
efficiency in Cas9 LR73 cells. Slc2a1 guide was introduced into Cas9- 
EGFP* LR73 cell clones. The efficiency of Slc2a1 deletion was quantified 
using qRT-PCR. d, Introduction of TAT-Cre into Sic2a1“" BMDMs 
efficiently knocks down SLC2A1 protein expression. Slc2al" bone 
marrow cells were treated with recombinant TAT-Cre during macrophage 
differentiation after isolation from the bone marrow. e, STF-31 did not 
affect antibody-mediated phagocytosis by peritoneal macrophages. 
C57BL/6 mice were intraperitoneally injected with 10 mg kg“! of either 
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STF-31 in X-VIVO medium 1 h before injection of IgG-coated Jurkat cells. 
CypHer5E-labelled Jurkat cells were injected intraperitoneally along with 
the drug. Mice were euthanized 1 h later, peritoneal cells were collected, 
and apoptotic cell engulfment by CD11b*F4/80"" macrophages was 
analysed by fluorescence-activated cell sorting. f, Slc2a1-deficient LR73 
cells or BMDMs were treated with STF-31, and the engulfment assay was 
conducted using CypHer5E-labelled apoptotic Jurkat cells. CypHer5E* 
phagocytic cells after 2 h of incubation were identified by flow cytometry. 
n.s., not significant. Data are representative of at least two independent 
experiments with 3-4 replicates per condition. g, The SLC2A1 

inhibitor STF-31 does not increase the number of thymocytes stained 
with 7-aminoactinomycin D (7AAD*) in vitro. Isolated thymocytes 

were incubated with dexamethasone (10 1M) with or without STF-31 

(2 mM). Four hours later, the cell death of the thymocytes was indicated 
by annexin7+7AAD‘. Data are representative of two independent 
experiments. 
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Extended Data Fig. 6 | The role of glycolytic genes in efferocytosis. undergo glycolytic flux during apoptotic cell clearance. Glycolytic flux and 
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concentration is lost in siRNA-treated conditions. Data are representative respiration tests. Data are representative of four replicates per condition. 
of at least three independent experiments with 3-4 replicates per d, Genes within the glycolytic pathway that are significantly upregulated 
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Extended Data Fig. 7 | Testing SGK1 and glycolysis in efferocytosis. 

a, Differential metabolic requirements of macrophages for efferocytosis 
versus antibody-mediated phagocytosis. BMDMs were co-cultured with 
apoptotic or antibody-coated Jurkat cells. Mitochondrial respiration was 
inhibited by addition of the mitochondrial complex I inhibitor rotenone 
(200 nM), the mitochondrial complex III inhibitor antimycin A1 (1 1M), 
or both (R + A). Aerobic glycolysis was inhibited by the addition of the 
pan-PDK inhibitor dichloroacetate (1 mM). Data are representative of 
three independent experiments. b, SGK1 inhibition blocks efferocytosis 
in vitro. LR73 cells were treated with SGK1 inhibitor and uptake of 
CypHer5E-labelled apoptotic Jurkat cells was assessed. c, BMDMs from 
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with or without the SGK1 inhibitor GSK650394 (5 1M) for 2 h, unbound 
apoptotic cells were washed away and the cell-surface expression of 
SLC2A1 was measured by flow cytometry after staining for surface Myc 
tag. Data are representative of at least two independent experiments. 

d, Continued uptake of apoptotic thymocytes was determined by the MFI 
(indicative of corpse-derived signal per phagocyte) of LR73 phagocytes 
over a time course of engulfment. SLC2A1 or SGK1 inhibitors were 
added at the beginning of engulfment (left of each pair of graphs) or 

1h post-apoptotic cell addition (right of each pair of graphs). Data 

are representative of at least three independent experiments with 3-4 
replicates per condition. 


GLUT1-Myc knock-in mice were co-cultured with apoptotic thymocytes 
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Extended Data Fig. 8 | See next page for caption. 
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Extended Data Fig. 8 | Testing SLC16A1 in efferocytosis. a, RT-PCR 
determination of Sgk1 expression in phagocytes treated with ATP. LR73 
cells were treated with indicated amounts of ATP for 4 h. Expression 

of Sgk1 was determined by qRT-PCR using hamster-specific primers. 
Data are representative of at least two independent experiments with 
3-4 replicates per condition. b, RT-PCR determination of Slc2a1, 
Slc16a1 and Sgk1 expression in phagocytes after addition of the PtdSer- 
masking peptide (GST-TSR) during efferocytosis. Apoptotic cells (AC) 
were added with or without TSR peptide (10 ng jl!) for 4 h. Expression 
of indicated genes was determined by qRT-PCR using hamster- 

specific primers. Data are representative of at least two independent 
experiments with 3-4 replicates per condition. c, SLC16A1 inhibition 
blocks efferocytosis in vitro. LR73 cells were treated with Slcl6al siRNA 
and uptake of CypHer5E-labelled apoptotic Jurkat cells was assessed. 
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d, SLC16A1 inhibitor SR13800 dampens efferocytosis by peritoneal 
macrophages. C57BL/6 mice were injected intraperitoneally with SR13800 
(10 mg kg~') in X-VIVO medium 1 h before injection of apoptotic cells. 
CypHer5E-labelled apoptotic Jurkat cells were injected intraperitoneally. 
After 1h, apoptotic cell engulfment by CD11b+F4/80' peritoneal 
macrophages was analysed by flow cytometry. Data are representative 

of two independent experiments with at least six mice in each group per 
experiment. e, f, Supernatants were prepared from LR73 cells, treated 
with control or Slc16a1 siRNA, that were engulfing apoptotic cells. The 
supernatants were added to BMDMs and incubated for 12 h. e, Expression 
of inflammatory markers was determined by qRT-PCR. f, After 24 h 

of incubation, expression of CD206 and F4/80 was determined by flow 
cytometry. Data are representative of two independent experiments with 
2-3 replicates per condition. 
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Mannose impairs tumour growth and enhances 


chemotherapy 


Pablo Sierra Gonzalez!, James O’Prey!, Simone Cardaci)*°, Valentin J. A. Barthet)*°, Jun-ichi Sakamaki', Florian Beaumatin!, 
Antonia Roseweir*, David M. Gay’, Gillian Mackay, Gaurav Malviya!, Elzbieta Kania!, Shona Ritchie!, Alice D. Baudot!, 
Barbara Zunino!, Agata Mrowinskal, Colin Nixon!, Darren Ennis*°, Aoisha Hoyle*, David Millan’, Iain A. McNeish>, 


Owen J. Sansom, Joanne Edwards? & Kevin M. Ryan!?* 


It is now well established that tumours undergo changes in cellular 
metabolism!. As this can reveal tumour cell vulnerabilities and 
because many tumours exhibit enhanced glucose uptake”, we have 
been interested in how tumour cells respond to different forms of 
sugar. Here we report that the monosaccharide mannose causes 
growth retardation in several tumour types in vitro, and enhances 
cell death in response to major forms of chemotherapy. We then 
show that these effects also occur in vivo in mice following the 
oral administration of mannose, without significantly affecting 
the weight and health of the animals. Mechanistically, mannose is 
taken up by the same transporter(s) as glucose® but accumulates 
as mannose-6-phosphate in cells, and this impairs the further 
metabolism of glucose in glycolysis, the tricarboxylic acid cycle, 
the pentose phosphate pathway and glycan synthesis. As a result, 
the administration of mannose in combination with conventional 
chemotherapy affects levels of anti-apoptotic proteins of the 
Bcl-2 family, leading to sensitization to cell death. Finally we 
show that susceptibility to mannose is dependent on the levels of 
phosphomannose isomerase (PMI). Cells with low levels of PMI 
are sensitive to mannose, whereas cells with high levels are resistant, 
but can be made sensitive by RNA-interference-mediated depletion 
of the enzyme. In addition, we use tissue microarrays to show 
that PMI levels also vary greatly between different patients and 
different tumour types, indicating that PMI levels could be used 
as a biomarker to direct the successful administration of mannose. 
We consider that the administration of mannose could be a simple, 
safe and selective therapy in the treatment of cancer, and could be 
applicable to multiple tumour types. 

As tumours often have a high avidity for glucose*, we examined 
the effect of other sugars on the growth of tumour cells. This revealed 
that mannose, in contrast to other sugars, significantly reduced the 
growth of U2OS cells (Fig. 1a). Similar effects occurred in Saos-2 cells, 
with fucose also causing a comparatively small decrease in cell growth 
(Extended Data Fig. 1a). Using a panel of cell lines, we observed that 
this effect of mannose occurs in cells from various tissues, with the 
effect being greater in some cells than others (Fig. 1b, c, Extended Data 
Fig. la-d). 

Mannose is imported into cells by means of the same transporters as 
glucose’, so we considered that mannose might interfere with the uptake 
of glucose. In support of this hypothesis, we observed that mannose 
enhances levels of phosphorylated AMPK—a read-out of energy bal- 
ance in cells®° (Extended Data Fig. le). However, liquid chromatography 
coupled with mass spectrometry (LC-MS) analyses showed that levels 
of the phosphorylated form of the glucose analogue 2-deoxyglucose 
(2-DG-P)—used as a proxy for glucose uptake—did not correlate with 
mannose sensitivity in mannose-sensitive compared with mannose- 
insensitive cell lines (Fig. 1a, b, Extended Data Fig. 1f, g). In fact, man- 
nose increased the intracellular pool of hexose-6-phosphate, which is 


produced in the first step of the metabolism of glucose and mannose 
(Extended Data Fig. 1h). This effect was not observed with other sugars 
(Extended Data Fig. 1i). 

To determine whether the mannose-induced increase in 
hexose-6-phosphate levels is due to mannose or glucose, we performed 
LC-MS with an extended chromatography time to enable selective 
detection of the unphosphorylated form of these sugars. This revealed 
that mannose increased the intracellular pool of mannose, as expected, 
but it also increased the intracellular pool of glucose (Extended Data 
Fig. 1j, k). To corroborate this finding, we treated cells with glucose 
labelled with two atoms of !3C (1,2-13C2-glucose) and mannose labelled 
with six atoms of °C (!3Cs-mannose) to enable detection due to differ- 
ences in molecular mass; we again found that treatment with mannose 
increased intracellular levels of glucose (Extended Data Fig. 11). 

As mannose did not reduce intracellular levels of glucose, but 
significantly affected cell growth, we considered that it might interfere 
with glucose metabolism. There is considerable crosstalk between the 
metabolism of these sugars, and mannose-6-phosphate can inhibit 
three enzymes that mediate glucose metabolism: hexokinases, phos- 
phoglucose isomerase (PGI) and glucose-6-phosphate dehydrogenase’ 
(Fig. 1d). To address this possibility, we measured the levels of 
hexose-6-phosphate and lactate when cells were incubated in glucose- 
free medium supplemented with either 1,2-'¥C2-glucose or !3Cg- 
mannose. This revealed that mannose accounts for more of the 
mannose-induced hexose-6-phosphate accumulation than does glucose, 
but also that glucose-6-phosphate is produced in the presence of mannose, 
albeit at lower levels when compared with cells treated with glucose 
alone (Fig. le). We also observed that mannose produces only small 
amounts of lactate—indicating that it is poorly metabolised—and more 
notably, mannose markedly reduces the production of lactate from 
glucose (Fig. 1f). Further analysis revealed that mannose treatment 
not only affects pools of glycolytic intermediates, but also affects those 
involved in the tricarboxylic acid cycle, the pentose phosphate pathway 
and glycan synthesis (Fig. 1g-i). As with the effects on cell growth, these 
metabolic effects were not observed to the same extent with other sugars 
(Extended Data Fig. lm—p). Moreover, although mannose uptake was 
not lower in mannose-insensitive cells when compared to mannose- 
sensitive cells (Extended Data Fig. 2a), the effects of mannose on metab- 
olism were only observed in cells sensitive to the sugar (Extended Data 
Fig. 2b-h). We also found that mannose-induced AMPK phosphoryl- 
ation does not occur concomitantly with changes in the levels of AMP 
or ATP, but is associated with a decrease in fructose-1,6-bisphosphate 
levels, as has been recently described® (Extended Data Fig. 3a-f). 

Because mannose affects the growth of tumour cells, we questioned 
whether it can also affect the cellular response to chemotherapeutic 
drugs. Although mannose alone did not affect cell viability, it sig- 
nificantly enhanced cell death when administered with cisplatin 
or doxorubicin—an effect not seen with other hexoses (Fig. 2a, b, 
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Fig. 1 | Mannose impairs the growth of cancer cells and interferes with 
glucose metabolism by accumulating intracellularly as mannose- 
6-phosphate. a, Growth curves of U2OS-E1a cells supplemented without 
(control) or with 25 mM of the hexoses stated. b, c, Growth curves of 
Saos-2 cells in DMEM alone (—mannose) or DMEM with an additional 
25 mM mannose (+mannose) (b) and KP-4 cells in IMDM alone or 

with 25 mM mannose (c). d, Scheme of mannose metabolism: mannose 
enters the cells using the same transporters as glucose (GLUT) and 

is phosphorylated into mannose-6-phosphate by hexokinases (HK). 
Mannose can then be used for glycosylation purposes or isomerized 

into fructose-6-phosphate by PMI; both PMI and PGI can also produce 
mannose-6-phosphate from glucose-6-phosphate. Mannose-6-phosphate 
also participates in the biosynthesis of deaminoneuraminic acid (KDN). 
Fruc, fructose; glc, glucose; man, mannose; TCA cycle, tricarboxylic acid 
cycle. e, f, Extraction of intracellular metabolites and measurement of the 
peak area per microgram of protein of glucose-6-phosphate (G6P) (m + 2) 
and mannose-6-phosphate (M6P) (m + 6) (e) and lactate (m + 0, m + 2, 
m + 3) (f), after 6-h incubation of U2OS-Ela cells in 10% dialysed FBS 
in glucose-free DMEM complete medium in the presence of 5 mM 
1,2-'3C2-p-glucose alone, 5 mM '3C.-p-mannose alone, or both sugars 

in combination. g-i, Relative amounts per microgram of protein of the 
intracellular metabolites glyceraldehyde-3-phosphate (GA3-P) (left) and 
phosphoenolpyruvate (PEP) (right) (g); «-ketoglutarate (a-KG) (left) 
and malate (right) (h); ribose-5-phosphate (ribose-5P) (left) and UDP- 
N-acetyl-glucosamine (UDP-GlcNAc) (right) (i) after a 6-h incubation 
of U2OS-Ela cells in 10% dialysed FBS in DMEM complete medium 
with 5 mM glucose, with or without 5 mM mannose as indicated. n = 3 
independent experiments (a-—c, g-i); data are representative of three 
independent experiments (e, f). Data are mean = s.e.m. and were analysed 
by two-way ANOVA with Bonferroni correction (a-c) or paired two-tailed 
Student’s t-test (g-i). *P < 0.05, **P < 0.01, ***P < 0.001. 


Extended Data Fig. 4a-c). Mechanistically, the combination of man- 
nose and the chemotherapeutic drug increased the levels of cleaved 
poly(ADP-ribose) polymerase (PARP, a substrate® of caspase-3) and 
this effect, together with cell death, was blocked by the pan-caspase 
inhibitor zVAD-FMK (Fig. 2c, d, Extended Data Fig. 4d). 

Because our data indicated that cell death after treatment with man- 
nose and the chemotherapeutic drug is likely to proceed by apoptosis, 
we used CRISPR-Cas9 to determine which apoptotic pathways might 
be involved. Disruption of caspase 8 and FADD—two components of 
the extrinsic apoptotic pathway’—had no effect on cell death (Extended 
Data Fig. 4e-g). By contrast, disruption of Bax and Bak—essential 
factors for mitochondrial outer membrane permeabilization and the 
intrinsic pathway!°—markedly reduced cell death after treatment with 
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mannose in combination with either cisplatin or doxorubicin (Fig. 2e, 
Extended Data Fig. 4h, i). 

As mitochondrial outer membrane permeabilization is controlled 
by members of the Bcl-2 family'’, we examined the levels of these pro- 
teins in the presence of mannose, cisplatin or both in combination. In 
agreement with previous studies that detail a role for the Noxa/Mcl-1 
axis in cell death induced by glycolysis inhibition and glucose dep- 
rivation!?!4, we found that levels of Noxa increased after treatment 
with cisplatin and cisplatin plus mannose, and levels of Mcl-1 and 
Bcl-X, decreased after the combination treatment (Fig. 2f, Extended 
Data Fig. 4j). Mechanistically, we consider that the change in Mcl-1 
and Bcl-X, levels is due to decreased translation because the levels 
of the proteins themselves were still decreased in the presence of a 
proteasome inhibitor, and there was no change in the levels of their 
mRNAs, after the combination treatment (Extended Data Fig. 4k—m). 
The involvement of these proteins was confirmed by CRISPR-mediated 
disruption of PMAIP1 (also known as NOXA) or by overexpression of 
Mcl-1 or Bcl-Xj, which all suppressed cell death induced by mannose 
and chemotherapy (Fig. 2g, h, Extended Data Figs. 4n-—q, 5a). Finally, 
we also observed that mannose enhances cell death induced by the 
Bcl-X, antagonist WEHI539, but not the Bcl-2 antagonist ABT-199 
(Extended Data Fig. 5b). 

We were keen to find out whether mannose also has effects in vivo. 
Tumour-bearing mice were given a single oral gavage of mannose, 
which resulted in a serum concentration of approximately 3 mM 
(Extended Data Fig. 6a). Mice were then injected with 2-deoxy-2- [SF] 
fluoro-p-glucose ({!8F]FDG) to monitor [!8F] FDG uptake and its sub- 
sequent conversion by hexokinases into [!8F] FDG-6-phosphate!». As 
mannose impedes hexokinases®, we found that the [!8F] FDG signal 
(provided by [!8F]FDG and/or ['8F] FDG-6-phosphate) was signifi- 
cantly reduced in mannose-treated mice when compared with control 
mice bearing tumours of an equivalent size (Fig. 3a, Extended Data 
Fig. 6b, c). A significant effect of mannose treatment on the ['8F]FDG 
signal was also seen in certain normal tissues (Extended Data Fig. 6d). 

To test whether mannose could affect tumour growth, mice were 
injected with tumour cells subcutaneously and were given mannose 
both freely in drinking water and three times a week by oral gavage. 
This did not affect the weight of the mice, nor did it visibly affect their 
health (Extended Data Fig. 6e). However, it significantly inhibited 
tumour growth (Fig. 3b), involving decreased numbers of BrdU- 
positive cells, which indicates that mannose inhibits cell proliferation 
both in vitro and in vivo (Extended Data Fig. 6f, g). 

We were also interested to know whether mannose can enhance 
chemotherapy in vivo. Tumour-bearing nude mice were treated with 
mannose and doxorubicin either alone or in combination. Although 
none of the treatments affected the weight or visibly affected the health 
of the mice (Extended Data Fig. 6h), we found that either doxorubicin 
or mannose caused a reduction in tumour volume (Fig. 3c). Moreover, 
an even greater effect was observed when doxorubicin was admin- 
istered in combination with mannose (Fig. 3c). Notably, when we 
examined the overall survival of the treated cohorts, those treated with 
doxorubicin plus mannose had a significantly increased life expectancy 
when compared to untreated mice or those treated with either doxoru- 
bicin or mannose alone (Fig. 3d). 

As our data indicated that mannose could potentially be used 
clinically, we wanted to understand why different cells vary in their 
sensitivity to the sugar. Because mannose affects several metabolic 
pathways, we reasoned that sensitivity may be connected to an apical 
enzyme involved in sugar metabolism. Our profiling revealed that 
mannose sensitivity was roughly inversely correlated with the levels 
and activity of PMI, the enzyme that catalyses the interconversion of 
mannose-6-phosphate and fructose-6-phosphate’® (Figs. la-c and 4a, 
Extended Data Figs. 1b-d, f, 7a). Consequently, we knocked down MPI, 
the gene that encodes PMI, in three mannose-insensitive cell lines: 
SKOV3, RKO and IGROV1. In each case, MPI knockdown caused 
growth retardation upon mannose treatment (Fig. 4b, Extended Data 
Fig. 7b-g) and markedly sensitized cells to cell death when mannose 
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Fig. 2 | The combination of chemotherapeutic drugs with mannose 
enhances cell death by potentiating the intrinsic pathway of apoptosis 
through the downregulation of Mcl-1 and Bcl-X, protein levels. 

a-h, All experiments were performed by pre-incubating cells in the 
presence of complete DMEM or complete DMEM supplemented with 

25 mM sugars for 24 h before the addition of other treatments. a, b, The 
percentage of U2OS-Ela (left) and Saos-2 (right) propidium-iodide 
(PI)-positive cells after 24 h treatment with 10 |.M cisplatin (a) and 1 pg ml“! 
of doxorubicin (b) in the presence or absence of 25 mM mannose as 
indicated. c, The percentage of U2OS-Ela propidium-iodide-positive cells 
treated for 24 h with or without 10 |M cisplatin, with or without 25 mM 
mannose and with or without 50 1M of the caspase inhibitor zVAD- 
FMK. d, Western blots showing the levels of cleaved PARP (Cl-PARP) 

and cleaved caspase-3 (Cl-Casp3) in U2OS-Ela cells after 24 h treatment 
with or without 10 ,.M cisplatin, with or without 25 mM mannose and 
with or without 50 tM zVAD-FMK. e, The percentage of empty (left) and 
Bax/Bak©®S?R (right) U2OS-Ela propidium-iodide-positive cells treated 
with or without 10 ,1M cisplatin and with or without 25 mM mannose 

for 24 h. f, Western blots showing the levels of Mcl-1, Bcl-X; and cleaved 
caspase-3 in U2OS-Ela cells after 24 h with or without 10 1M cisplatin, 
with or without 25 mM mannose and with or without 50 1M zVAD-FMK. 
g, h, Percentage of propidium-iodide-positive U2OS-Ela (empty, Mcl-1 
and Bcl-X, overexpressing) cells after 24 h treatment with or without 10 11M 
cisplatin and with or without 25 mM mannose. n = 3 independent 
experiments (a-c, e, g, h); data are representative of three independent 
experiments (d, f). Data are mean + s.e.m. and were analysed by two-way 
ANOVA with Bonferroni correction (a-c, e, g, h). *P < 0.05, **P < 0.01, 
#E*P < 0.001. 


was administered together with cisplatin (Fig. 4c). Conversely, the over- 
expression of MPI in a mannose-sensitive cell line rendered the cell line 
refractory to the effects of mannose on both cell growth and cell death 
(Fig. 4d, Extended Data Fig. 7h-k). Finally, we found that mannose 
treatment following MPI knockdown had highly significant effects on 
metabolic pathways downstream of the sugar (Fig. 4e, Extended Data 
Fig. 7l-o). 

We next questioned whether PMI could be modulated to affect the 
response of tumours to mannose in vivo. As the immune microen- 
vironment is important in many therapeutic situations and because 
mannose can affect immune cell function!”!®, we decided to use 
immune-competent mice and two syngeneic cell lines (B16-F1 and 
LLC) that are ordinarily mannose-insensitive but become man- 
nose-sensitive upon Mpi knockdown (Fig. 4f, g, Extended Data 
Fig. 8a—d). In both cases, allografts formed with Mpi-knockdown 
cells were highly sensitive to the oral administration of mannose 
(Fig. 4h-k), without visibly affecting the health or weight of the mice 
(Extended Data Fig. 8e-h). 
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Fig. 3 | Mannose impairs tumour growth and induces tumour 
regression in combination with chemotherapy. a, CD1-nude mice were 
transplanted with KP-4 cells subcutaneously and tumours were grown 

for 14 days before positron emission tomography (PET) and magnetic 
resonance imaging (MRI) scans were carried out. a, PET and MRI scans 
of mice treated with 200 \1l of water (top) or 200 iil of 20% (w/v) mannose 
in water (bottom) by oral gavage 20 min before injection of [!8F]FDG into 
the tail vein. White dotted circles highlight tumour areas in axial view of 
the mice. b, CD1-nude mice were injected with KP-4 cells subcutaneously 
and received either normal drinking water or 20% mannose in the 
drinking water, plus the same treatment (either normal water or 20% 
mannose) by oral gavage three days a week from the third day after tumour 
transplantation. Tumour volume (mm?) was measured as indicated. 

c, d, CD1-nude mice were injected with KP-4 cells subcutaneously and 
tumours were grown for 10 days before the start of mannose treatment. 
Mice received either normal drinking water (control and doxorubicin) 

or 20% mannose in the drinking water (mannose and doxorubicin + 
mannose), plus the same treatment (either water or 20% mannose in 
water) via oral gavage three days a week. Doxorubicin treatment started 
on day 32 and mice received 5 mg kg by intraperitoneal injection 

once a week. c, Tumour volume (mm?’) of all treatment groups. d, Graph 
representing the survival of the mice in all treatment groups until the end 
of the experiment at day 73. The number of mice for each experiment is as 
follows: n = 5 (—mannose), n = 4 (+-mannose) (a); 7 = 10 (—mannose), 
n= 8 (+mannose) (b); n = 10 per group (c, d). Data are mean + s.e.m. and 
were analysed by two-way ANOVA with Bonferroni correction (b, c) or 
log-rank test (two-sided Mantel-Cox test) (d). **P < 0.01, ***P < 0.001. 


We were keen to know whether PMI levels varied in human tumours, 
such that the analysis of PMI could potentially be used as a biomarker 
for mannose sensitivity. We therefore stained tissue microarrays con- 
taining sections from ovarian, renal, breast, prostate and colorectal 
cancers. This revealed that PMI levels not only varied among tumours 
from the same tissue, but also between tissues (Fig. 41, Extended Data 
Fig. 9a). PMI levels did not, however, have prognostic significance in 
breast and colon cancer, presumably because normal serum levels of 
mannose are low compared to glucose’? (Extended Data Fig. 9b-d). 

Most notable from our analyses was the fact that colorectal tumours 
generally have very low PMI levels (Fig. 41), which indicates that they 
may be broadly sensitive to mannose. To explore this, we used an 
inflammation-driven model of colorectal cancer and a genetically engi- 
neered mouse model driven by two genes that are frequently altered 
in this disease (Kras and Apc)*°. In both models, mice maintained 
on drinking water containing 20% mannose had significantly fewer 
tumours at the clinical end point (Fig. 4m, n), and notably, mannose 
had no negative effect on the health or weight of the mice over the time 
examined (Extended Data Fig. 9e, f). 

In contrast to its effects on glucose metabolism, mannose does not 
decrease the uptake of amino acids or fatty acids, and although man- 
nose reduces glucose-dependent serine and glycine synthesis, this 
contributes only marginally to total cellular serine and glycine pools 
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Fig. 4 | PMI levels dictate mannose sensitivity. a, Western blot showing 
the levels of PMI and ERK2 in a panel of ten cancer cell lines. b, Growth 
curves of SKOV3 cells in complete DMEM medium with or without 
supplementation of 25 mM mannose after transient transfection with 

two non-targeting (NTC) and two PMI-targeting short interfering 

RNAs (siRNAs) individually for 48 h. c, Cell death represented as the 
percentage of propidium-iodide-positive SKOV3 siRNA-transfected cells 
that had been pre-incubated with or without 25 mM mannose in regular 
DMEM for 24 h, before either 10 j.M cisplatin was added or no cisplatin 
was added with incubation for a further 24 h. d, Overexpression of PMI 
renders U2OS-Ela insensitive to mannose. Growth curves of U2OS-Ela 
overexpressing PMI (U2OS-PMI) and U2OS-El1a cells expressing vector 
control (U2OS) after culture in either in DMEM or DMEM containing 25 mM 
mannose. e, Percentage of metabolite content of hexoses-6-phosphate 

in SKOV3 cells transfected with siRNA for 48 h before 6 h incubation in 
complete DMEM medium with or without supplementation of 25 mM 
mannose. Cells treated with NTC 1 without mannose were normalized 

to 100%. f, g, Knockdown of Mpi renders B16-F1 and LLC cells sensitive 
to mannose. The indicated cells were incubated with or without 25 mM 
mannose for 24 h before cell counting. h-k, B16-F1 or LLC cells expressing 
Mpi-targeting or control shRNAs were injected into the flanks of C57BL/6 
mice. Mice were maintained either with or without 20% mannose in 


(Extended Data Fig. 10a—-f). Mannose does not invoke an endoplasmic- 
reticulum stress response, but affects transcription, translation and 
autophagy, although these effects were reversed by overexpression of 
MPI, which indicates that they are downstream of glucose metabo- 
lism (Extended Data Fig. 10g-1). Moreover, the ablation of autophagy 
did not affect mannose sensitivity, showing that autophagy inhibition 
is not the mechanism underlying the effects of the sugar (Extended 
Data Fig. 10m, n). In summary, we conclude that mannose represents 
a well-tolerated means to interfere with glucose metabolism that could 
potentially be used clinically, either alone or in combination with other 
forms of cancer therapy. 
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drinking water and tumour growth was monitored over time (n = 10 mice 
per group). 1, MPI expression levels in tissue microarrays from ovarian 

(n = 45), renal (n = 180), breast (n = 159), prostate (prost.) (m = 155) and 
colorectal (colorect.) (n = 216) tumours. m, Mice (n = 14 per group) were 
subjected to azoxymethane plus dextran sodium sulfate treatment for 

68 days. Mice were treated with normal drinking water or with 20% 
mannose in drinking water until the clinical end point. Tumours were 
counted in the colon of each mouse. n, Villin™*™® Apc"!* KrasS!>’+ mice 
were aged until the clinical end point. Mice were treated with normal 
drinking water or with 20% mannose in drinking water from four days 
post-induction until the clinical end point (n = 9 mice (—mannose), 

n = 8 mice (+mannose)). Tumours were counted in the colon of each 
mouse. n = 3 independent experiments (b, ¢, e, f); n = 5 independent 
experiments (d). Data represent one independent experiment performed 
in technical triplicate (g) or are representative of two independent 
experiments (a). Data are mean + s.e.m. and are analysed by a one-sided 
Mann-Whitney U-test. ***P < 0.001. Data were analysed by two-way 
ANOVA with Bonferroni correction (c, e), multiple two-sided unpaired 
t-test with Holm-Sidak correction (d), two-way ANOVA with Tukey 
correction (h-k), one-way ANOVA with Bonferroni correction (1), 
unpaired two-tailed Student's t-test (m) and one-sided Mann-Whitney 
U-test (n). *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001. 
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METHODS 


Cell culture, transfections and infections. All cell lines were from Beatson 
Institute stocks and were originally obtained from American Type Culture 
Collection or European Collection of Authenticated Cell Cultures repositories, 
apart from PATU-8902 (DMSZ, ACC-179) and KP-4 (RIKEN, RCB1005). A549, 
IGROV-1, Saos-2, U2OS, U20S-Ela, SKOV-3, RKO and PATU-8902 cells were 
grown in DMEM high glucose supplemented with 10% FBS, glutamine (0.292 mg 
ml!) and penicillin (100 units per ml)-streptomycin (100 Lg ml!) (all from Life 
Technologies). U2OS-Ela cells were generated in our laboratory and have been 
previously described”!. K562 cells were grown in RPMI-1640 and KP-4 cells were 
grown in IMDM (Life Technologies) with 20% FBS, supplemented with penicillin 
(100 units per ml)-streptomycin (100 jug ml~!). The B16-F1 mouse skin melanoma 
cell line was a gift from L. Machesky (Cancer Research UK Beatson Institute) 
and cells were maintained in DMEM (Thermo Fisher Scientific, 21969035) 
supplemented with 10% FBS (Thermo Fisher Scientific, 10270106), 2 mM 
L-glutamine (Thermo Fisher Scientific, 25030032), and penicillin-streptomycin 
(Thermo Fisher Scientific, 15140122). LLC mouse Lewis lung carcinoma cells were 
a gift from S. Zanivan (Cancer Research UK Beatson Institute) and were main- 
tained in RPMI-1640 (Thermo Fisher Scientific, 31870074), 10% FBS, 10 mM 
HEPES (Thermo Fisher Scientific, 15630080), 2 mM t-glutamine and penicillin- 
streptomycin. These cell lines were confirmed to be free of mycoplasma. 

Cells were transfected using calcium phosphate precipitates as previously 
described'®. U2OS-Ela cells were infected with virus containing pBabe- 
puro-empty or pBabe-puro-Bcl-X,, following the same protocol as previously 
published”!. U2OS-Ela cells were infected with virus containing pLZRS empty or 
pLZRS HA-Mdcl-1 (provided by S. Tait)”. U20S-Ela-infected cells were selected in 
DMEM 10% FBS containing 600 jig ml! of neomycin for 2 weeks. U2OS control 
and Bax/Bak CRISPR cells were provided by S. Tait. 

The following siRNAs were used: PMI siRNA-1 (Dharmacon, J-011729- 
05-0002), PMI siRNA-2 (Dharmacon, J-011729-06-0002), PMI siRNA-3 (Dharmacon, 
J-011729-07-0002), PMI siRNA-4 (Dharmacon, J-011729-08-0002), NTC1 
(Dharmacon, D-001810-03-20) and NTC2 (Dharmacon, D-001810-04-20). 

lentiCRISPR v2 was a gift from F. Zhang (Addgene plasmid 52961)*. The 
following single-guide RNA sequences were used in the experiments. NTC 1: 
GTAGCGAACGTGTCCGGCGT; NTC 2: GCTTGAGCACATACGCGAAT; 
ATGS5: AAGAGTAAGTTATTTGACGT; ATG7: GAAGCTGAACGAGTATCGGC; 
Casp8: GCCTGGACTACATTCCGCAA; FADD: TTCCTATGCCT CGGGC 
GCGT; BAX: AGTAGAAAAGGGCGACAACC; BAK: GCCATGCTGGTAGAC 
GTGTA; NOXA: TCGAGTGTGCTACTCAACTC. 

The following shRNAs were used: pGIPZ-non-targeting control (NTC) 
(Dharmacon, RHS4346), PMI 1 (Dharmacon, RMM4431-200352145, Clone ID: 
V2LMM_110673) and PMI 2 (Dharmacon, RMM4431-200355616, Clone ID: 
V2LMM_203337). 

pLX304 was a gift from D. Root (Addgene plasmid 25890). Human PMI cDNA 
was amplified using pCMV-Sport-MPI (Dharmacon, MHS6278-202802339) 
as a template and inserted into pDONR221 vector (Thermo Fisher Scientific, 
12536017) using Gateway BP Clonase II Enzyme mix (Thermo Fisher Scientific, 
11789020). PMI cDNA was then transferred into a pLX304 destination vector 
using Gateway LR Clonase II Enzyme mix (Thermo Fisher Scientific, 11791020). 
Lentivirus production and infection were carried out as described previously”*. 
Cell culture treatments. Mannose treatment. Cells were seeded in six-well plates 
and incubated overnight at 37 °C. The following day, the medium was replen- 
ished with fresh full growth medium (DMEM, RPMI-1640 or IMDM) containing 
25 mM p-mannose (DMEM or IMDM) or 11.11 mM p-mannose (RPMI-1640). 
Other sugars were added to a concentration of 25 mM: p-glucose (Sigma-Aldrich, 
G8270), D-fructose (Sigma-Aldrich, F3510), p-galactose (Sigma-Aldrich, G5388), 
L-fucose (Sigma-Aldrich, F2252 and Cayman Chemical, 16479). New stocks were 
prepared every two weeks of 1 M mannose in Milli-Q water and sterilized by 
filtering through a 0.22-\1m pore filter. For control conditions, the same volume 
of Milli-Q water was added to the medium. Cells were left for at least 24 h for 
cell-death experiments or for 6 h for metabolic experiments. Cell death was blocked 
by treatment with zVAD-FMK (Apax Labs, A1902). Where indicated, cells were 
also treated with 2-deoxy-p-glucose (Sigma-Aldrich, D8375), tunicamycin (Sigma- 
Aldrich, T7765) or chloroquine (Sigma-Aldrich, C6628). 

C-labelled sugars. The day after seeding, cells were washed three times with abun- 
dant PBS before adding glucose-free DMEM (supplemented with 10% dialysed 
FBS, 2 mM glutamine, 100 units per ml of penicillin and 100 jg ml“ of strep- 
tomycin). Depending on the experiments, this medium could contain 5 mM of 
1,2-13C,-glucose alone, together with °Cg-mannose or 5 mM C¢-mannose alone. 
Stocks of 1,2-!3C2-glucose and !3Cs-mannose were prepared at a concentration of 
0.5 M in Milli-Q water before sterilization by filtering through a 0.22-\1m pore filter. 
Chemotherapeutic drugs. Cells were plated overnight and then incubated for the 
indicated times in control or mannose-containing medium. After one day of incu- 
bation, fresh control and mannose-containing media were prepared and drugs were 


added as described for each experiment. The drugs used were cisplatin (Sigma- 
Aldrich, C2210000) and doxorubicin (Sigma-Aldrich, D1515). 

RT-qPCR. RT-qPCR was carried out as previously described™ using primers 
for MCLI1 (Qiagen, QT 00094122) and BCL2L1 (Qiagen, QT00997423). mRNA 
levels were determined by the relative standard curve method, normalized to 18S. 
Western blotting. Protein extraction was performed as previously described”. 
Protein lysates were separated by SDS-PAGE and blotted onto nitrocellulose 
membranes. Western blot analysis was performed according to standard tech- 
niques as previously described”>. The following antibodies were used at a dilution 
of 1:1,000 unless otherwise stated: anti-B-actin (Abcam, ab8227), anti-Mcl-1 (Cell 
Signaling, 4572), anti-ERK2 (Santa Cruz, sc-154), anti-Bcl-X; (Cell Signaling, 2762), 
anti-HSP-906 (Santa Cruz, sc-1057), anti-PARP (Cell Signaling, 9542), anti-cleaved 
caspase-3 (Cell Signaling, 9664), anti-FADD (BD Transduction Laboratories, 
F36620), anti-caspase-8 (Cell Signaling, 4790), anti-Bax (BD Transduction 
Laboratories, 610983), anti-Bak (Cell Signaling, 6947), anti-Bim (Cell Signaling, 
2993), anti-Noxa (Novus Biologicals, NB-600-1159), anti-phospho-AMPKa 
(Cell Signaling, 2535), anti-AMPKa (Cell Signaling, 5832), anti-PMI (Abcam, 
128115), anti-LC3B (Cell Signaling, 2775S, 1:1,500), anti-(}-actin (Cell Signaling, 
4670S, 1:2,000), anti-ATG5 (Cell Signaling, 129948), anti-ATG7 (Cell Signaling, 
8558S), anti-Bip (also known as GRP78; Cell Signaling, 3177S) and anti-p62 
(BD Biosciences, 610833, 1:2,000). Mouse PMI protein was detected using rabbit 
polyclonal PMI antibody (Proteintech, 14234-1-AP). Validation was based on 
information provided in the manufacturers’ datasheets. In addition, as indicated 
in the manuscript, we used RNAi or CRISPR-Cas9 to validate the antibodies used 
to detect the following proteins: Bax, Bak, caspase-8, FADD, PMI, Atg5 and Atg7. 
Metabolic extraction of intracellular metabolites. Cells were seeded at a concen- 
tration of 100,000 cells per well in six-well plates. The next morning, the medium 
was replenished with fresh full medium and cells were kept under these conditions 
for another 24 h to stabilize their metabolism. Approximately 36-40 h after being 
plated, cells were treated under different conditions as described, using full growth 
medium or glucose-free medium in the presence or absence of unlabelled sugars 
or labelled sugars for 6 h. After a 6-h incubation at 37 °C, intracellular metabolites 
were extracted. 

The medium was aspirated and six-well plates were placed on ice and washed 
thoroughly with 4 °C PBS three times before the addition of 500 il extraction 
solvent (50% methanol, 30% acetonitrile, 20% Milli-Q water) to each well. Plates 
were then agitated at 4 °C for 5 min to successfully extract intracellular metabolites 
and then centrifuged at 16,100g for 10 min at 4 °C. Supernatants were transferred 
into HPLC vials and stored at —80 °C before LC-MS analysis. 

An Exactive Orbitrap mass spectrometer (Thermo Scientific) was used together 
with a Thermo Scientific Accela HPLC system. The HPLC setup consisted of a 
ZIC-pHILIC column (SeQuant, 150 mm x 2.1 mm, 5 jum, Merck KGaA) with a 
ZIC-pHILIC guard column (SeQuant, 20 mm x 2.1 mm) and an initial mobile 
phase of 20% 20 mM ammonium carbonate, pH 9.4 and 80% acetonitrile. Cell 
extracts (5 1l) were injected and metabolites were separated over a 15-min mobile 
phase gradient, decreasing the acetonitrile content to 20%, at a flow rate of 200 jl 
min! and a column temperature of 45 °C. The total analysis time was 23 min. 
For longer runs of 37 min, a 30-min gradient with the same solvents was used, 
at a flow rate of 100 1] min“! and a column temperature of 30 °C”®, All metabo- 
lites were detected across a mass range of 75-1,000 m/z using the Exactive mass 
spectrometer at a resolution of 25,000 (at 200 m/z), with electrospray ionization 
and polarity switching to enable both positive and negative ions to be determined 
in the same run. Lock masses were used and the mass accuracy obtained for 
all metabolites was below 5 p.p.m. Data were acquired with Thermo Xcalibur 
software (version 2.2). 

The peak areas of different metabolites were determined using Thermo 
TraceFinder software (version 3.2), in which metabolites were identified by the 
exact mass of the singly charged ion and by known retention time on the HPLC 
column. Commercial standards of all metabolites detected had been analysed pre- 
viously on this LC-MS system with the pHILIC column. The °C labelling patterns 
were determined by measuring peak areas for the accurate mass of each isotopo- 
logue of many metabolites. Intracellular metabolites were normalized to the protein 
content of the cells, measured at the end of the experiment by the Lowry assay”’. As 
the proteins precipitate upon addition of the metabolite extraction solvent, protein 
content was measured in the wells after the metabolites were extracted. 
Translation assay. Cells at 50% confluency were pretreated with or without 25 mM 
mannose for 24 h or with 100 jg ml“! cycloheximide for 1 h. [*°S]methio- 
nine (1 MBq) (Perkin Elmer, EasyTag EXPRESS*°S Protein Labelling Mix, 
NEG772002MC) was added to the culture medium (2 ml per well in a six-well plate) 
for 30 min. Cells were washed in ice-cold PBS, then lysed in lysis buffer (10 mM 
Tris pH 7.5, 50 mM NaCl, 0.5% NP40, 0.5% SDS, benzonase (Sigma-Aldrich 
E1014, 2 ul per 10 ml of lysis buffer)). Proteins were precipitated in 25% trichlo- 
roacetic acid at 4 °C for 30 min. The precipitates were washed on glass-fibre filters 
(Whatman 934-AH, 1287-024) with 70% ethanol followed by acetone, dried, and 
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incorporation of *°S was quantified in a liquid scintillation counter. The results 
were calculated as relative *°S incorporation per 10° cells. 

Transcription assay. Cells at 50% confluency were pretreated with 25 mM mannose 
or with a control for 24 h, or with 5 ,.M actinomycin D for 1h. [*7P]UTP (1.1 MBq) 
(Perkin Elmer, BLU007H001MC) was added to the culture medium (2 ml per well 
in a six-well plate) for 6 h. Cells were washed in PBS and mRNA was prepared using 
the Dynabeads mRNA DIRECT Kit (Life Technologies, 61012). °*P-labelled mRNA 
was quantified in a liquid scintillation counter and the results were calculated as 
relative [>?P]UTP incorporation per 10° cells. 

PMI enzymatic assay. Cells were grown to confluence, washed with PBS and 
collected by centrifugation (5 min, 150g at 4 °C). Cell pellets were lysed by three 
freeze-thaw cycles. Post-nuclear protein fractions (40 1g) were used to determine 
PML activities in each cell line tested by means of a coupled enzymatic reaction. 
In brief, samples were incubated in a buffer containing 200 mU of PGI (Roche, 
10 127 396 001), 500 mM glucose-6-phosphate dehydrogenase, 1 mM NADP*, 
40 mM Tris-HCl pH 7.4, 6 mM MgCh, 5 mM NazHPO,/KH>POg,. Reactions were 
initiated by the addition of 1 mM mannose-6-phosphate, and the production of 
NADPH/H*‘ was assessed for 2 h at room temperature by measuring the optical 
density at 340 nm (OD340nm). In parallel, western blots directed against PMI and 
ERK2 were performed to examine the correlation between PMI activities and PMI 
expression levels in each cell line analysed. 

Proteasome assay. Cells were seeded 24 h before 24-h treatments using media 
either with or without mannose supplementation (25 mM). A luminescence-based 
assay (Proteasome-Glo chymotrypsin-like cell-based assays, Promega) was per- 
formed according to the manufacturer's protocol. The measured proteasome activi- 
ties were normalized to cell number by counting cells at the end of each experiment. 
Cells were treated with 10 jtM MG132 as a control for the specificity of the assay. 
Lowry assay. The Lowry assay was used to determine the total protein content for 
each one of the triplicates from the metabolic experiments. First, 500 jul of solution 
A (70% Milli-Q water, 20% 5 M NaOH, 10% 2,5-dimethoxy-4-chloroamphetamine) 
was added to each well from a six-well plate and left under agitation for 20 min. 
Next, 5 ml of solution B (0.5 g NaCu-EDTA, 40 g Na2CO3, 8 g NaOH in 21) was 
added to each well and left for 40 min. Then, 500 iil of Folin reagent was added to 
each well and left for a minimum of 15 min until a blue colour was observed in 
each well. One or two six-well plates were used to construct a standard curve of 
protein concentration with BSA. Finally, 200 il from each well was separately trans- 
ferred to a 96-well plate and the absorbance was measured at 750 nm. The protein 
concentration was then measured by calculating the equation of the standard curve 
and extrapolating the absorbance of each well. 

Flow cytometry. Flow cytometry for unfixed cells (cell-death assay) was conducted 
as previously described, and a FACSCalibur or ATTune NXT flow cytometer was 
used for the analysis. 

Animal experiments. All in vivo xenograft experiments were performed using 
six-week old female CD1-nude mice or C57/BL6] wild-type mice as approved by 
the Glasgow University Animal Welfare and Ethical Review Body and in accordance 
with UK Home Office guidelines. Mice were placed five per cage with free access 
to water and food (chow diet). Experimental cohort sizes were based on previous 
similar studies that have given statistical results while also respecting the limited 
use of animals in line with the 3R system: replacement, reduction, refinement. In 
no cases were mice maintained once the tumour size limit that was permitted in 
our Home Office license (15 mm in any direction) had been reached. All treatment 
studies were randomized, but did not involve blinding. 

KP-4 cells (5 x 10° cells) were injected subcutaneously with 100 l of Matrigel in 
either the right flank or both the left and the right flanks. LLC cells (1 x 10° cells) 
and B16-F1 cells (1 x 10° cells) were injected subcutaneously in 200 il or 100 jul 
of PBS suspension, respectively. Mice were weighed and tumours were measured 
using callipers three times per week (usually on Monday, Wednesday and Friday 
of each week). The calculation of tumour volume was as follows: (L x S?)/2 (where 
Lis the longest length and S is the shortest length). 

For mannose treatment, normal drinking water was exchanged for 200 ml of 
20% mannose in drinking water (w/v) and it was replaced once every week. Mice 
received mannose by oral gavage (200 1l) three times per week from the same stock 
of 20% mannose in water. 

For doxorubicin treatments, each mouse received intraperitoneal injections 
at a concentration of 5 mg kg“! once per week. Stocks of 1 mg ml“! doxorubicin 
(Sigma-Aldrich, D1515) were prepared in Milli-Q water. 

For azoxymethane (AOM) + dextran sodium sulfate (DSS) experiments, 
mice were injected with a single dose of AOM (Sigma-Aldrich, A5486) at 
10 mg kg". Five days after AOM injection, mice received three cycles of 1.5% 
DSS (MP Biomedical, 0216011080) in water; each cycle lasted five consecutive 
days, with one week in between each cycle. In the week between each cycle, mice 
received normal water or 20% (w/v) mannose water. After the last cycle of DSS, 
mice were maintained with normal water or mannose for two weeks before being 
killed on day 68. 
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Tissue collection. All mice were killed by CO-mediated euthanasia” before 
collecting any tissue sample. Mice were placed in a cage and exposed to CO; for 
100 s before neck dislocation. 

Xenografted tumours were removed and detached from the skin and cut in 
half. One half of each tumour was fixed for 24-30 h in 10% neutral buffered 
formalin at room temperature. Formalin was then exchanged for 70% ethanol 
before histology. 

The colons of mice from AOM + DSS carcinogenesis experiments were 
collected and fixed in formalin for histological processing. 

For the experiments with the genetically engineered mouse model, 
Villin***8 Apc" + Kras6?!+ mice*® aged 6-12 weeks were given a single intra- 
peritoneal injection of 80 mg kg” tamoxifen (Sigma-Aldrich, T5648). Four 
days post-induction, drinking water was exchanged for fresh drinking water or 
20% mannose in drinking water (w/v) and given freely, this was replaced every 
week. Mice were aged until the clinical end point—when they displayed anaemia, 
hunching or weight loss. Colons were pinned out into 10% neutral buffered 
formalin and the number of tumours was counted. 

Blood sampling and blood metabolic extraction. To measure blood mannose 
levels, tail tipping was performed. Blood samples were directly frozen using dry 
ice and stored at —80 °C until metabolic extraction was performed. 

For metabolic extraction of whole blood, 2-5 11 of each sample was diluted in 
100-250 iil of metabolic extraction buffer on ice for 5 min (1:50 dilution). Then, all 
samples were centrifuged at 17,949¢ at 4 °C for 15 min. Finally, 100 iil of the super- 
natant from each sample was transferred into HPLC vials and stored at —80 °C 
until LC-MS analysis. 
PET and MRI scanning. Mice with KP-4 cells xenografts (n = 9, weight 24.6 + 1.8 g) 
received either 200 i1l of 20% w/v mannose in water (treatment group) or normal 
water (control group) by oral gavage 20 min before the injection of [SF] FDG. 
[8F]FDG (12.9 + 1MBq) in 200 pl of normal saline was administered via an 
intravenous bolus injection in the tail vein. After an uptake phase of 30 min, PET 
and MRI scans were sequentially performed using a nanoScan (Mediso Ltd) PET/ 
MRI (1T) scanner. Mice were maintained under 2-2.5% isoflurane in medical 
air during the injection and imaging procedures. Static PET acquisitions were 
performed for 15 min, and subsequently whole body T1 GRE 3D Multi-FOV MRI 
scans (slice thickness 0.50 mm, repetition time (TR) 10 ms, echo time (TE) 2.3 ms, 
flip angle 12°) were performed to obtain anatomical references. For quantitative 
assessments of scans, regions of interest (ROI) were manually drawn around the 
edge of the tumour xenograft on MRI scans by visual inspection using PMOD 
software version 3.504 (PMOD Technologies) and the same ROI was copied on 
the respective PET scans. Tumour ROIs were slightly different between scans 
depending on the positions and angles of the mice on the scanner, therefore 
separate ROIs for each scan were drawn. The percentage injected dose per ml 
(%ID ml~') was calculated using the formula %ID ml~! = ROI activity (kBq 
ml!) / injected dose x 100%. Data were reported as average %ID ml! + s.d. 
Student's t-tests were used when comparing data between mannose-treated and 
control mice. 

Tissue microarray PMI staining. Tissue microarrays (TMAs) were first placed in 
xylene for 5 min before three washes of 1 min each (two in ethanol and one in 70% 
ethanol). TMAs were washed for 5 min in deionized water and then held for 25 min 
at 98 °C in a PT module using pH 6 sodium citrate retrieval buffer. They were 
further washed once in tris-buffered Tween (TbT) before blocking endogenous 
peroxidase for 5 min. TMAs were washed again using TbT and stained with 
PMI antibody (1C7, 1:50 dilution) for 1 h. TMAs were washed in TbT before 
and after incubation for 30 min in a mouse EnVision detection system. A 10-min 
incubation in 3,3’-diaminobenzidine tetrahydrochloride was then perfomed. 
TMAs were again washed with deionized water (1 min) before and after incuba- 
tion with haematoxylin Z, followed by post-incubation with 1% acid alcohol. This 
was followed by a 30-s wash with deionized water, and then incubation with Scott’s 
Tap Water substitute (1 min) and deionized water (1 min). 

BrdU staining. Xenografted tumours were embedded in paraffin, after which 
sections of an appropriate thickness were cut and kept at 60 °C overnight. Sections 
were first placed in xylene for 5 min before three washes of 1 min each (two in 
ethanol and one in 70% ethanol). Sections were washed for 5 min in deionized 
water and then held for 25 min at 98 °C in a PT module using pH 6 sodium 
citrate retrieval buffer. They were further washed once in TbT before blocking 
endogenous peroxidase for 5 min. Sections were washed in TbT before and after 
incubation with BrdU antibody (BD Biosciences 347580, 1:200 dilution) for 
35 min. Sections were washed in TbT before and after incubation for 30 min in a 
mouse EnVision detection system. A 10-min incubation in 3,3’-diaminobenzidine 
tetrahydrochloride was then perfomed. Sections were again washed with deionized 
water (1 min) before and after incubation with haematoxylin Z, followed by 
post-incubation with 1% acid alcohol. This was followed by a 30-s wash with 
deionized water, and then incubation with Scott’s Tap Water substitute (1 min) 
and deionized water (1 min). 
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Tissue microarrays. TMAs from ovarian, breast, colorectal, prostate and renal 
cancer were stained for PMI. The microarrays were scanned using a digital slide 
scanner (SCN 400 E, Leica Biosystems) and were scored for PMI expression on 
the basis of the percentage of tumour that showed negative, low, medium or high 
expression. Each sample was scored with a number between 0 and 300 according 
to the following equation: Score = (0 x negative) + (1 x low) + (2 x medium) 
+ (3 x high). 

Statistical analysis and reproducibility. All presented data were analysed using 
GraphPad Prism software. Four different statistical analyses were performed 
depending on the data from the different experiments shown, typically one-way 
ANOVA, two-way ANOVA, log-rank (Mantel-Cox) test and Student's t-tests that 
could be paired or unpaired and one-tailed or two-tailed. Four levels of signifi- 
cance were determined: *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001, 
with NS indicating no significance. Where a representative experiment is shown, 
the number of times the same result was observed in independent experiments is 
detailed in the corresponding figure legend. 

Reporting summary. Further information on research design is available in 
the Nature Research Reporting Summary linked to this paper. 


Data availability 
The data supporting the findings of this study are available within the paper and 
its Supplementary Information. Source Data for Figs. 1-4 and Extended Data 


Figs. 1-10 are available with the online version of the paper. Data are available from 
the corresponding author upon reasonable request. 
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Extended Data Fig. 1 | See next page for caption. 
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Extended Data Fig. 1 | Mannose affects cell growth and metabolism. 

a, Growth curves of Saos-2 cells supplemented without (control) or with 
25 mM of hexoses (man, mannose; gal, galactose; fru, fructose; fuc, fucose; 
glc, glucose). b-d, Growth curves of PA-TU-8902 (b) and A549 cells (d) 

in DMEM alone or with an additional 25 mM mannose; K562 cells in 10% 
FBS RPMI-1640 medium with or without 11.1 mM mannose (c). 

e, Western blots showing the levels of phospho-AMPKa (T172) and total 
AMPK«a after 5, 15, 30 and 45-min incubation of U2OS-Ela with standard 
medium or medium supplemented with 25 mM mannose. f, Growth 
curves of SKOV3 and RKO cells in DMEM alone or with an additional 

25 mM mannose. g, Levels (expressed as peak area per microgram of 
protein) of 2-deoxyglucose-phosphate (2-DG-P) in RKO, SKOV3, SAOS-2 
and U2O0S-Ela (U2OS) cells incubated with 10 mM 2-deoxyglucose for 

6h in the presence of 25 mM mannose in the culture medium (DMEM). 
Data are the average of three technical replicates and are representative 

of two independent experiments. h, Levels (measured as the percentage 

of peak area per microgram of protein, on a logio scale) of hexoses- 
6-phosphate (hexoses-6P) in U2OS-E1a cells after 6 h incubation in 10% 
dialysed FBS with 5 mM glucose in DMEM either with or without 5 mM 


mannose. i, Peak area per microgram of protein of hexoses-6-phosphate 
in U2OS-Ela cells incubated for 6 h in DMEM, with or without an 
additional 25 mM of the indicated sugars. j, k, Peak area per microgram 
of protein of intracellular non-phosphorylated mannose (j) or relative 
amounts of glucose (k) after a 6 h incubation of U2OS-Ela cells in 5 mM 
glucose in DMEM, with or without 5 mM mannose. |, Relative amount 
per microgram of protein of non-phosphorylated glucose m + 2 after 

6h incubation of U2OS-Ela cells in glucose-free DMEM either with 

5 mM 1,2-'3C2-p-glucose alone or with 5 mM 1,2-'°C,-p-glucose and 

5 mM 13C,-p-mannose. m-p, Peak area per microgram of protein of 
glyceraldehyde-3-phosphate (GA3P) (m), phosphoenolpyruvate (PEP) 
(n), lactate (0) and UDP-GIcNAc (p) in U2OS-Ela cells after 6h 
incubation in DMEM, with or without an additional 25 mM of the 
indicated sugars. n = 3 independent experiments (a-d, f, h-p). Data 

are representative of two independent experiments (e). All data are 
mean = s.e.m. and were analysed by one-way ANOVA (i, m—p), two-way 
ANOVA followed by Tukey's multiple comparisons (a) or paired 
two-tailed Student’s t-test (h, j-l). *P < 0.05, **P < 0.01, ***P < 0.001. 
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(b), lactate (c), glyceraldehyde-3-phosphate (glyceraldehyde-3P) 
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Extended Data Fig. 3 | Mannose has a rapid effect on cellular (f) in U2OS-Ela (U2OS) cells after 5 min treatment with or without 
metabolism. Metabolite content (expressed as peak area per microgram 25 mM mannose in the culture medium (DMEM). n = 3 independent 
of proteins) of hexoses-6-phosphate (a), ATP (b), AMP (c), fructose-1, experiments, presented as mean + s.e.m. and analysed by a two-tailed 
6-bisphosphate (F1,6-BP) (d), ribose-5-phosphate (e) and UDP-GlcNac unpaired t-test. *P < 0.05, ***P < 0.001. 
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Extended Data Fig. 4 | See next page for caption. 


© 2018 Springer Nature Limited. All rights reserved. 


LETTER 


Extended Data Fig. 4 | Mannose sensitizes cells to chemotherapy- 
induced cell death via the intrinsic apoptotic pathway. a, b, Percentage 
of propidium-iodide-positive KP-4 cells after 24 h treatment with 40 1M 
cisplatin (a) or 1 1g ml! doxorubicin (b) in the presence or absence of 
25 mM mannose. ¢, Percentage of U2OS-Ela propidium-iodide-positive 
cells after 24 h treatment with or without 10 1M cisplatin in DMEM with 
or without 25 mM of the indicated additional sugars. d, Percentage of 
U20S-Ela propidium-iodide-positive cells after 24 h treatment with or 
without 1 wg ml~! doxorubicin, with or without 25 mM mannose and 
with or without 50 1M zVAD-FMK. e, f, Fold increase of the percentage 
of propidium-iodide-positive Saos-2 (NTC, caspase-8 and FADD 
CRISPR) cells upon 24 h treatment in DMEM with or without 1 jg ml7! 
doxorubicin (e) or 10 1M cisplatin (f) and with or without 25 mM 
mannose. g, Western blots showing the levels of caspase-8, FADD and 
ERK2 in NTC, caspase-8 and FADD CRISPR Saos-2 cells. h, Percentage 
of empty and Bax/Bak CRISPR U20S-Ela propidium-iodide-positive 
cells with or without 1 jg ml~’ doxorubicin and with or without 25 mM 
mannose. i, Western blots showing the levels of Bax, Bak and HSP90 in 
empty and Bax/Bak CRISPR U2O0S-Ela cells. j, Western blots showing 
the levels of Bim, Noxa, Bad, Bax, Actin and HSP-90 in U2OS-Ela cells 
after 24 h treatment with or without 10 ,\M cisplatin, with or without 


25 mM mannose and with or without 50 1M zVAD-FMK. The HSP-90 
blot is identical to the one shown in Fig. 2f as the blots are from the same 
experiment. k, Western blots showing the levels of Mcl-1 and Bcl-X, in 
U20S-Ela cells after 48 h with or without 10 1M cisplatin, with or without 
25 mM mannose and in the absence or presence of 10 1M MG132 as 
indicated. 1, m, PCR with reverse transcription (RT-PCR) showing the 
levels of BCL2L1 (Bcl-X,) (1) and MCL1 (m) mRNAs in U20S-Ela cells 
after 48 h treatment with 10 \.M cisplatin alone, 25 mM mannose alone, 
or both 10 1M cisplatin and 25 mM mannose, in the presence of 50 1M 
zVAD-FMK. n, 0, Percentage of U2OS-Ela (empty), Mcl-1 and Bcl-X;, 
overexpressing propidium-iodide-positive cells after 24 h treatment with 
or without 1 j.g ml! doxorubicin and with or without 25 mM mannose. 
p, q, Western blot showing the levels of Mcl-1, Bcl-X; and HSP-90 

in U2OS-Ela (empty), Mcl-1 and Bcl-X,;, overexpressing cells. n = 3 
independent experiments (a-c, e, f, h, |-o). Data are representative of 
two independent experiments (g, j, k) or one experiment (i, p, q). In 

d, n = 3 independent experiments (each involving technical triplicates) 
(—zVAD-FMK) and n = 2 independent experiments (each involving 
technical triplicates) (+zVAD-FMK). All data are mean + s.e.m. and were 
analysed by two-way ANOVA with Bonferroni correction (a-f, h, n, 0). 
*P < 0.05, ***P < 0.001. 
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Extended Data Fig. 5 | Sensitization to cell death by mannose is 
connected to Bcl-2 family members. a, Percentage of control (NTC) and 
Noxa CRISPR U20S-Ela propidium-iodide-positive cells upon 24h 
treatment with or without 10 1M cisplatin and with or without 25 mM 
mannose. b, Percentage of U2OS-Ela propidium-iodide-positive cells 
upon 24 h treatment with 10 »M WEHI539 (a BclX, inhibitor) or 10 1M 
ABT-199 (a Bcl-2 inhibitor), with or without 25 mM mannose. n = 4 
independent experiments (a); n = 3 three independent experiments (b). 
All data are mean + s.e.m. and were analysed by two-way ANOVA with 
Bonferroni correction (a) and two-tailed unpaired t-test (b). ***P < 0.001; 
****P < 0.0001; NS, not significant. 
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Extended Data Fig. 6 | See next page for caption. 
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Extended Data Fig. 6 | Mannose affects cell proliferation and the 
uptake/retention of ‘8F-FDG, but it does not affect animal weight. 

a, Mannose levels in the plasma after 60 min in mice treated with a single 
dose of 200 jl water or 20% mannose in water. b, c, CD1-nude mice were 
transplanted with KP-4 cells subcutaneously and tumours were grown 
for 14 days. PET and MRI scans were performed for mice treated with 
200 jl of water or 20% mannose in water by oral gavage 20 min before 
injection of ['SF]FDG into the tail vein. b, Quantification of ['*F]FDG 
uptake by tumours represented in average percentage injected dose per ml 
(%ID ml~!) + s.d. c, Volume of each tumour at the time of the PET and 
MRI scans. Data were analysed by unpaired two-tailed Student's t-test. 

d, CD1-nude mice were injected with KP-4 cells subcutaneously and 
treated with normal drinking water or 20% mannose in the drinking 
water, plus the same treatment (either normal water or 20% mannose) 

by oral gavage three days a week from the third day after tumour 
transplantation. Shown is the quantification of ['*F]FDG uptake by the 
tumour and different organs represented in averaged kBq ml“! + s.d. Data 
were analysed by unpaired two-tailed Student’s t-test. e, CD1-nude mice 
were injected with KP-4 cells subcutaneously and treated with normal 
drinking water or 20% mannose in the drinking water, plus the same 
treatment (either normal water or 20% mannose) by oral gavage three 
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days a week from the third day after tumour transplantation. The weight 
of mice was recorded at the indicated times. f, g, CD1-nude mice were 
injected with KP-4 cells subcutaneously and treated with normal drinking 
water or 20% mannose in the drinking water (either normal water or 

20% mannose) by oral gavage three days a week from the third day after 
tumour transplantation. f, Images of BrdU sections representing tumours 
in control (left) and mannose-treated (right) mice. g, Quantification of 
BrdU-positive cells per section in control tumours (n = 4) and mannose- 
treated (n = 4) tumours. Five representative images per tumour were 
analysed. h, CD1-nude mice were injected with KP-4 cells subcutaneously 
and tumours were grown for 10 days before treatment with mannose and/ 
or doxorubicin (doxo) was started. Mice received normal drinking water 
(ctrl and doxo) or 20% mannose in the drinking water (man and doxo 

+ man) together with the same treatment by oral gavage three times per 
week. Doxorubicin treatment started on day 32 and mice received 5 mg kg~! 
by intraperitoneal injection once per week. The weight of mice in all 
groups was recorded throughout the experiment. The number of mice for 
each experiment is as follows: n = 3 per group (a), n = 5 (—mannose), 
n= 4 (+mannose) (b-d); n = 10 (h). In a, cand h data are mean 

+ s.e.m. Data were analysed with a two-tailed unpaired t-test (c). 

*P < 0.05, **P < 0.01. 
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Extended Data Fig. 7 | See next page for caption. 
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Extended Data Fig. 7 | PMI levels dictate the response to mannose. 

a, PMI expression levels correlate with PMI activities. PMI activities were 
measured in eight different cell lines using coupled enzymatic reactions. 
Graph shows the OD349nm measured at 2 h, reflecting the amount of 
NADPH/H? produced by the reactions. Results from three independent 
experiments were normalized relative to PMI activities measured in 
Saos-2 cells and represent mean + s.e.m. b-g, MPI knockdown sensitizes 
cells to mannose. b, Western blot of SKOV3-transfected cells showing the 
levels of PMI and actin after 48 h of transient transfection with siRNAs. 
Growth curves of SKOV3 (c), RKO (f) and IGROV1 (g) in regular DMEM 
supplemented with or without 25 mM mannose after transient transfection 
for 48 h with two NTC and two MPI-targeting siRNAs. d, e, Western 

blot showing the levels of PMI and ERK2 in RKO (d) and PMI and actin 
in IGROVI (e) 48 h after siRNA transfection. h-k, Overexpression of 
PMI causes resistance to the growth-suppressing and death-promoting 
effects of mannose. PMI expression in U2OS-Ela (h) and Saos-2 (i) cells 
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was confirmed by western blotting. j, Saos2-PMI cells and control cells 
(Saos-2) were plated in the presence or the absence of 25 mM mannose 
and cell numbers were counted at the indicated times. k, Percentage of 
propidium-iodide-positive U2OS-Ela control cells (vector) or U2OS-Ela 
cells overexpressing PMI (PMI) after 24 h treatment with or without 
1g ml“! doxorubicin in the presence or absence of 25 mM mannose. 
1-0, Percentage of lactate (1), GA3-P (m), a-KG (n) and malate (0) 
metabolite content (peak area) in SKOV3 cells transfected with siRNA for 
48 h before 6 h incubation in complete DMEM medium with or without 
supplementation of 25 mM mannose. n = 3 independent experiments 

(a, ¢, f, g, j, 1-0). Data are representative of two independent experiments 
(b, d, e, k), or one experiment (h, i). Data are mean + s.e.m. and were 
analysed by two-tailed unpaired t-test (a), two-way ANOVA with Tukey 
correction (I-o) or multiple unpaired two-sided t-test with Holm-Sidak 
correction (j). *P < 0.05, ***P < 0.001, ****P < 0.0001. 
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Extended Data Fig. 8 | See next page for caption. 
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Extended Data Fig. 8 | PMI levels dictate mannose sensitivity, and 

the weight of animals containing syngeneic allografts is not affected 
by treatment with mannose. a, b, Mpi knockdown causes growth 
retardation upon mannose treatment in a dose-dependent manner. 

a, B16-F1 cells infected with NTC or shRNA against Mpi were treated 
with (blue columns) or without (white columns) 25 mM mannose for 72 h, 
after which the number of cells was determined. Data are mean + s.e.m. 

n = 3 independent experiments. b, The reduction in PMI was confirmed 
by western blotting. c, Western blot showing a reduction in PMI after Mpi 
knockdown in LLC cells. The data are relative to the experiments shown 
in Fig. 4g, j, k. d, B16-F1 cells infected with NTC or shRNA against Mpi 
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were treated with 25 mM mannose, 25 mM glucose, 25 mM galactose, 25 mM 
fructose or 25 mM fucose. At 24, 48 and 72 h after hexose treatment, the 
number of cells was determined. Data are mean + s.e.m. e-h, Weights of 
mice injected with syngeneic cell lines: B16 shNTC (e), B16 shPMI (f), 
LLC shNTC (g) and LLC shPMI (h); the mice were given normal drinking 
water or drinking water supplemented with 20% mannose (n = 10 mice 
per group). All data are mean + s.d. unless otherwise stated. In d, n = 4 
independent experiments were analysed by two-way ANOVA followed by 
Tukey’s multiple comparisons. The data in b and c were performed only 
once. ***P < 0.001. 
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Extended Data Fig. 9 | PMI levels vary in human tumours, but do 

not predict overall survival and the weight of mice is not affected by 
mannose treatment in models of colorectal cancer. a, Images of PMI 
expression for each TMA, representing one negative sample (left), one 
low expression (middle) and one high expression (right). One sample 
came from each of the ovarian, breast, prostate, colorectal or renal TMAs 
as indicated. b, c, Kaplan-Meier curves showing cancer-specific survival 
based on PMI levels in n = 698 patients with stage I-IV colorectal cancer 
(b) or n = 316 patients with primary operable breast cancer (c). d, Table 


showing the association of PMI and cancer-specific survival. Histoscores 
were split into high and low expression using the ROC curve analysis for 
each tumour type. log-rank analysis (two-sided) was used to compare PMI 
and cancer-specific survival using SPSS (version 22). e, Mean weight of 

28 mice during treatment with AOM and DSS, with or without additional 
mannose treatment. f, Weight of aged Villin*°=8 Apc" Kras$P/* mice 
during treatment with or without mannose. Data are mean of each group 
(n = 7 mice, —mannose; n = 8 mice, +mannose). 
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Extended Data Fig. 10 | See next page for caption. 
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Extended Data Fig. 10 | Mannose does not affect amino acid and fatty 
acid uptake, nor does it significantly affect serine and glycine levels, 
endoplasmic reticulum stress or proteasome activity, but it does affect 
autophagy, transcription and translation in a PMI-dependent manner. 
a, Exchange rates of amino acids between the indicated cells and their 
media measured after 48 h treatment with or without 25 mM mannose. 
Results are presented as peak area per microgram of protein per hour 
and are representative of one experiment. Shown is the mean of six 
technical replicates. b, Levels of '*Cs-palmitate (expressed as peak area 
per microgram of protein) in U2OS-Ela cells incubated with 50 1M 
3C)¢-palmitate, conjugated with 10% fatty-acid-free BSA, for 24 h in the 
presence or absence of 25 mM mannose in the culture medium (DMEM). 
n = 2 independent experiments (each involving six technical replicates). 
Data are mean + s.e.m. ¢, d, Levels of '°C3-serine (c) and !?C2-glycine 

(d) (expressed as peak area per micrograms of protein) in U2OS-Ela 
cells incubated with 25 mM °C¢-glucose for the indicated time in the 
presence or absence of 25 mM mannose in the culture medium (DMEM). 
n = 3 independent experiments. e, f, Distribution of isotopologues of 
intracellular serine (e) and glycine (f) in U2OS-Ela cells incubated with 
25 mM '3C¢-glucose for 24 h in the presence (mannose) or absence 

(ctr) of 25 mM mannose in the culture medium (DMEM). n = 3 
independent experiments. g, Mannose has no effect on the unfolded 
protein response. U2OS-Ela cells were treated with 25 mM mannose for 
16 and 24 h. 2-Deoxy-p-glucose (2DG) and tunicamycin (TM) serve as 
positive controls. Induction of Bip (also known as GRP78) is a read-out 


of endoplasmic reticulum stress. The data are representative of three 
independent experiments. h, Proteasome activity is not affected by 
mannose. Mannose-sensitive U2OS-Ela cells were incubated in either 
DMEM or DMEM containing 25 mM mannose before measurement 

of proteasome activities. Cells were also treated with10 \t1M MG132 as 

a control for proteasome activity. n = 3 independent experiments. 

i, j, U2OS-Ela cells (i) or U2OS-Ela (U2OS) cells overexpressing PMI (j) 
were treated with 25 mM mannose for the indicated times in the presence 
or absence of 20 1M chloroquine (CQ) (4h). Western blotting was 
undertaken to detect LC3-B and actin. The data are representative of three 
independent experiments. k, 1, Relative incorporation of **P UTP (k) or 
35§ methionine (1) into U2OS-Ela control cells (vector) or U2OS-Ela cells 
overexpressing PMI (PMI) in the presence or absence of 25 mM mannose. 
Where indicated, 5 {1M actinomycin D was used to inhibit transcription 
or 100 jg ml“! cycloheximide was used to inhibit translation. In k, n = 3 
independent experiments. In 1, n = 3 independent experiments 

(control and mannose) and n = 2 independent experiments (CHX). 

m,n, U2O0S-Ela cells infected with lentiCRISPR-NTC 1, NTC 2, ATG5 

or ATG7 were treated with 25 mM p-(+)-mannose for 72 h. m, The 
number of cells counted after the 72-h treatment; the results represent the 
mean of one independent experiment performed in triplicate. n, Western 
blots show loss of LC3 lipidation and p62 accumulation in ATG5 and 
ATG7 knockout cells. Data were analysed by two-tailed unpaired t-tests. 
*P < 0.05, ***P < 0.001. Unless otherwise stated, data are mean + s.e.m. 
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KATHRYN SAEB-PARSY 


CAREERS 


Twenty things I wish I'd known 
go.nature.com/phdtwenty 


Lessons from my early 
career go.nature.com/femalephysics 


Contact the editors 
naturecareerseditor@nature.com 


WORK-LIFE BALANCE 


Kourosh Saeb-Parsy with his children, Nadia and Kiana, at a Mini Mudder obstacle course. 


Fathers in science 


Five scientist dads describe how they and their partners juggle their family and careers. 


Be pragmatic 
and flexible 


Dean of the faculty of life sciences 
at University College London. His 
daughter Milly is 15; his son Alexis 13. 


Milly was due at the end of March 2003 but 
arrived in December 2002 with a 50% chance 
of survival, and a 50% chance of having a 
serious disability. 

I was about to set up my own research 
group and achieve full independence. I'd 
been awarded a Wellcome Senior Clinical 
Fellowship and we were soon to move house. 


We spent 99 days in the neonatal intensive 
care unit, a distressing environment. 

I returned to work after three months and 
the focus that first year was hiring postdocs 
and getting back into the swing of things. 
Then things got more normal. We stopped 
worrying about Milly being rushed back into 
hospital. We started thinking about child 
care and had our first family holiday. 

My wife, Rebecca, is a medical oncolo- 
gist. We try to divide our child-care respon- 
sibilities 50-50. The live-out nannies we had 
before the children started school (a fantas- 
tic privilege to be able to afford, but also a 
significant cost) usually arrived around 
8 a.m. and left around 6-7 p.m.. Our work- 
ing pattern then had clear starts and stops. 
We often had to leave meetings to get home. 

Even now, making meetings outside 


normal working hours is tough. The chil- 
dren go to school at 8 a.m. so I can just about 
make 8.45 a.m.. You have to be pragmatic 
and flexible about what can be cancelled or 
rescheduled and what can't. It’s important to 
keep talking to your partner so you dont get 
out of kilter with sharing child care. Other- 
wise you lose that feeling of shared respon- 
sibility. 

Milly plays the flute and several years ago 
her primary school gave us six days’ notice 
of a lunchtime concert, her first solo perfor- 
mance. It was also Brain Awareness Week 
and I was giving a lunchtime public lecture. 
Rebecca was in clinic. 

I wanted to attend the concert, but also 
to deliver the lecture. I made the right deci- 
sion. My colleague, Sarah-Jayne Blakemore, 
stepped in and her wonderful lecture was | 
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> much better than the one I would have 
given. And I got to see Milly perform. 

People have occasionally said they chose 
my research group because of the respect I 
try to foster for work-life balance. Ihave a 
reputation for my out-of-office messages, 
which I use to communicate clearly (and 
humorously) that life outside work can 
sometimes take priority, and that family 
holidays are something I enjoy. I started 
getting congratulatory e-mails about them. 
Now I face pressure to come up with a fun- 
nier one whenever we go away. My last one 
read: “A long hot summer, epic collapse of 
global political discourse, impending Brexit 
chaos, new NSS scores, creeping REF prepa- 
ration and a crescendo of short-notice UKRI 
deadlines. This can mean only one thing — 
it’s time to go on vacation! I’ve gone with my 
wife and children to play, talk, swim, read, 
eat, drink, think, snooze and relax. And you 
should too — it’s August, after all!” 

A core message that I repeat is how much 
I value my family and that sometimes they 
will come first. But I’m also sensitive to the 
fact that not everybody wants to talk about 
their family or personal life at work. 

No single solution works for everybody. I 
often get asked what the best time is to have 
children. There’s never a single right time. 
There are trade-offs at all stages. Also, some 
people don’t want, or can't, have children and 
you need to have sensitivity and respect for 
everyone. 

At her inaugural professorial lecture, my 
colleague Mairéad MacSweeney talked about 
solidarity across genders and that men need 
to speak up more for flexible working. Until 
both men and women articulate that parent- 
ing is a normal part of working life, it won't 
be accepted. 

Balance is a verb, not a noun. And work-— 
life balance is not something you fix. It’s 
something you're constantly practising and 
rehearsing and reflecting on. 


David Smith’s son learns to grow crystals. 


KOUROSH SAEB-PARSY 
Align priorities to 
avoid conflict 


Transplant surgeon at Cambridge 
University Hospitals, UK. His daughter 
Nadiais 8 and his daughter Kiana is 7. 


My wife, Kathryn, is a primary-school teacher 
and has frequent evening or weekend social or 
work commitments. We work as a team and 
divide and conquer what needs to be done. 

So I might pick the girls up from school 
or drop them off for guitar lessons because 
Kathryn is at work. She picks them up probably 
75% of the time — but we don’t keep tabs or 
‘scores. If I can’t pick them up from school, it 
is because I am doing something that Kathryn 
believes is good for our team. 

I dont buy the idea that you have to sacrifice 
family life for your profession. Historically, that 
might have been the case, but times are chang- 
ing, and I tell my students that you have to aim 
for both. 

When there is competition between family 
and work, it’s because everyone's priorities are 
not aligned. If you do start sacrificing time with 
your children, you're ona slippery slope that's 
not sustainable in the long term. I told Nadia 
and Kiana what I do when they were each about 
3. I described what a transplant is. We've also 
talked about organ donation, where kidneys 
come from and about difficult situations in 
which patients have died. 

I did this so that they understand why I'm not 
always around, usually because I'm on call. If1 
cant be there for lunch on Christmas Day, for 
example, I tell them why — saying it's because 
a patient has been given an amazing present, a 
new kidney. 

They also know about my research. They love 
coming to the lab and meeting my group. We 
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play (safely) with pipettes and solutions. I get 
them to label things. I don’t want my science to 
bea black box to them. 

I mentor people at Cambridge. We often talk 
about careers and family. One asked how he 
could finish his PhD with a second child. We 
discussed the importance of perspective and 
how, in the grand scheme of family life, work 
events should not always be prioritized over ‘life: 

This is symptomatic of the wider 
issue — seeing parenting as a problem, rather 
than as one of the most wonderful events in 
your life. 

We've been impressing on our daughters that 
they can be anything they want, and are keen to 
introduce them to strong female role models. 


DAVID SMITH 
Cement your 
relationships 


Chemist at the University of York, UK. 
His sonis 5. 


My husband, Sam, has cystic fibrosis, and after 
his lung transplant in 2011, we were able to 
adopt. When our son arrived in 2015, I was in 
my early 40s and a full professor. 

We wanted to split the adoption leave into six 
months each, but the social worker at the adop- 
tion agency preferred one primary carer for the 
first 12 months to help form attachments. Sam 
earned less than me, so he took the full year. I 
was disappointed and a bit jealous, but wanted 
to maximize my involvement and took two 
months of full leave and then worked at 80% of 
a full-time post for a year. 

Many scientist mums have a hard job juggling 
work and family. I hope that more fathers will 
step up to caring responsibilities, and that this 
situation will change. 

Our son knows that his daddy does science 
and wears a lab coat. He’ really into volcanoes, 
both fascinated by and a bit scared of them. 

We went to Iceland recently. When we got 
home, we made a plaster-cast volcano, then 
added bicarbonate of soda, washing up liquid 
and red food colouring into it. We tasted stuff 
for its acidity, to see what would cause an erup- 
tion. He loved that. 

The chemistry department has an Athena 
SWAN Gold Award (a UK gender-equality 
charter) and meetings take place in core hours, 
so there shouldnt ever be one that starts before 
9.45 a.m., and they should finish by 4p.m.. We 
do have 9 a.m. lectures, and I have to drop my 
son off at school at 8.45 a.m., so I can't give those. 
But the department makes it easy for that to be 
arranged. 

used to spend around 50 nights a year away 
at conferences; now, it’s more like 10, and never 
more than one night at a time. I recently did a 
PhD viva (oral exam) in Bristol. I left at 8a.m. 
and got back at midnight, spending nine hours 


DAVID K. SMITH 


on a train. Previously I would have stayed over, 
but you have to trim those bits of academic life 
to enable yourself to cement relationships to 
support family life. 

I’ve just turned down conference invita- 
tions to India, Canada and the United States. 
Employers consider business travel as part of 
your working hours. But it’s still time away from 
your family. 

I think the conversations that men often 
have at conferences can be corrosive. They talk 
about how big their research groups are and 
how many papers they have published, and not 
about whether they took some time off when 
their kids came along. That needs to change. 

In academia, many people work way beyond 
any idea of notional contracted hours. In my 
experience, when you became a parent, you 
realize how many ‘soft hours’ you were putting 
in — those hours when you dont think youre at 
work. You're sitting with your partner, but youre 
completely absent and catching up on e-mails 
on your phone. 

Now I try to compartmentalize work and 
home life, but this can lead to massive e-mail 
crises, especially when I am trying to do the 
role working 35-40 hours a week. If I spent 
5 minutes on every e-mail I receive, that would 
be my whole working week gone. 


BRIAN CAHILL 
Organize your day 
carefully 


Research programme manager at the 
University of Edinburgh, UK. His sons 
are 5 and 3. 


My wife, Lini, is from Indonesia and a tele- 
vision producer. We were in Germany for 
ten years, until I moved to the University of 
Edinburgh in October. 

We had been in Germany for five years 
when our first son was born. Already knowing 
the country and some German helped us to 
make some of the administration around par- 
enting, such as registering births, easier. Also, 
if you're dealing with a midwife in the middle 
of the night, it’s good to be able to speak the 
local language. A scientist who speaks only 
English and moves to a foreign country can 
still function as a scientist. But being able to 
speak the local language outside the lab bubble 
can help in getting things done. 

Before we had kids, I would get up at 
6a.m., leave at 7a.m., and be at work for 
7.30a.m.. When we had our first child, I still 
got up at 6a.m. and mostly reached work by 
8.30a.m.. With two children, it was often 
around 9a.m.. I dropped them off at kinder- 
garten at 7.45 a.m.. It’s important to be part of 
your children’s kindergarten life (Lini picked 
the boys up in the afternoon). Otherwise 
youre excluded from a huge part of their lives. 


In Germany, the working day starts at 7 a.m., 
so by 9a.m. my colleagues were often having 
coffee. One thing we had there, which I haven't 
seen anywhere else in academic life, is flexi- 
time, which was very useful. My workplace 
strongly discouraged weekend working, which 
was also great. 

I struggle with switching off from work. 
There's always someone sending important 
e-mails at 5.30 p.m. on Fridays. It’s not good to 
let people down in research, but I personally 
cant stay up every night until midnight when 
I get up at 6a.m.. 

I've let go of reviewing, which is a basic part 
of being a researcher. You say you will do it, but 
it’s often impossible to find the time. Writing 
papers and project proposals are significant for 
your career, but many aren't done during work 
time because of meetings, so you often do the 
work at home, which is harder with kids, who 
need our attention. 

At home in Germany, we devoted a lot of 
time and energy to language learning, reading 
books and singing songs. Otherwise, our chil- 
dren would not pick up English or Indonesian. 
This did put a lot of responsibility on the kin- 
dergarten to teach them German, which they 
picked up alongside English and Indonesian. 

The parental leave we got in Germany gave 
us 14 months between each parent. The maxi- 
mum that one parent can take is 12 months. 
But you can also take it flexibly and extend it 
over 2.5 years by taking half days, for example. 

Lini’s parents are in Indonesia, and mine are 
in Ireland. My kids knew a lot of older people 
in the housing cooperative where we lived in 
Gottingen and had a grandparent-like rela- 
tionship with them. There was a real sense of 
community there. 

It's not completely impossible to be a good 
parent and a good researcher, but you have to 
organize your day. 


PAUL MARTIN 
Become more 
efficient 


Cell biologist at the University of Bristol, 
UK. His daughter Matilda (Tilly) is 21, 
and his daughter Charlotte is 15. 


When I mentioned that I was being interviewed 
for this article, some colleagues said that men 
don't need an article like this, that it’s tougher 
for women. There is no doubt that my wife's 
career was slowed down more than mine was 
when the children were younger. If they were 
off sick from nursery, they tended to want their 
mum more than they wanted me, for example. 
Kate is also a cell biologist and head of the 
school of biochemistry at Bristol and, as the 
kids have got older, she has been able to move 
back up a gear with her career. 

When Kate was having Tilly, a friend told 


her that you publish one less paper per child 
per year. There's some truth in that. In the early 
days, we shared child-care responsibility for 
Tilly in the evenings, although it worked out 
that I did two evenings and Kate did three, so 
there was an imbalance. If one of my child-care 
evenings clashed with a meeting of the London 
Fly Club (part of the Drosophila research com- 
munity), I'd take Tilly along. She was about 3 
or 4 at the time. 

I recently came across papers that I had 
given Tilly one evening during a club meeting. 
They were subdivided into squares and shed 
drawn pictures of the speakers’ slides in them. 
One was of a fly. Another was ofa gel. It was 
a lovely memory, but it made me a little sad. I 
thought: “What kind of awful parent was I?” In 
my opinion, we cannot be decent parents if we 
work until 9 p.m. I remember Kate sometimes 
calling and telling me to come home. But when 
youre setting up a lab, you sometimes have to 
work long hours. Science is highly competi- 
tive. As a parent, you somehow have to become 
more efficient. 

Some colleagues have trimmed down 
important things such as tea-room discus- 
sions with potential collaborators, and miss 
seminars on areas that are slightly outside 
their remit. 

Just working at a university puts pressure on 
your kids. But there are positives. Tilly recently 
graduated with a biology degree. Asa student, 
she would check things with us, not the details 
of lectures but rather the process, and what you 
need, say, to get a 2:1 (a high honours degree 
classification in the United Kingdom, one 
down from a first). 

One perk of being a scientist is that you can 
introduce your kids to useful and interesting 
people. Tilly just dropped by to meet one of 
my PhD students, a dentist. She's interested in 
that as a career. 

Our daughters are impressed if something 
we do relates to their world. Earlier this year, 
for instance, a paper I co-authored in Devel- 
opmental Cell was featured in The New York 
Times and received a lot of Twitter action. 

For atime, I definitely was not as competitive 
as colleagues who did not have children. I feel 
that I have a more worldly perspective now — 
that I’m not engrossed in science all the time. 

What forces me to step back (and also to 
realize my ignorance) is having a daughter who 
is revising for her biology GCSE (exams taken 
by secondary-school students aged 16 in Eng- 
land, Wales and Northern Ireland) and asks 
me about the difference between xylem and 
phloem. I tell her I know nothing about plants, 
although I do know lots about inflammation. 
She says: “You're supposed to be a professor, 
dad.” I say: “Let’s look it up.” m 


INTERVIEWS BY DAVID PAYNE 

These interviews have been edited for length, 
clarity and style. Would you like to participate ina 
follow-up article about scientist mothers? E-mail 
naturecareerseditor@nature.com 
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Ua SCIENCE FICTION 


SO, ONE OF THOSE TINY ALIEN 
SPACESHIPS HAS FLOWN INTO 
YOUR HOUSE. NOW WHAT? 


BY LAURA PEARLMAN 


he best strategy is not to let them in, of 
[Tense but maybe you left your chim- 

ney flue open or didn't notice a hole 
in a window screen, and now there’ a tiny 
alien spaceship flitting around 
your living room or kitchen or 
swooping under your dining- 
room table, and there’s really 
no point in self-recrimination 
or speculation about which of 
your kids might have propped 
the front door open while they 
all loaded their cars up after 
Thanksgiving weekend, leav- 
ing you to rattle around alone 
in your empty nest once again. 

However it happened, it’s just 
you in the house with your two 
cats and something that looks 
like a miniature flying neon- 
pink Roomba. The first thing 
you need to do is take three 
deep calming breaths, unless of 
course your cat is making that 
weird trilling noise they make 
when they see a bird, in which 
case the first thing you need to 
do is get your cat the hell out of 
there before it tries to pounce 
and the aliens blast it with one 
of their energy cannons. 

Once you've got your cat safely 
shut in another room — maybe your bedroom, 
where your other cat is already cowering 
under the bed — and cleaned up any scratches 
you might have acquired in the process, you 
can go back to step one. Take three deep 
breaths and remind yourself that the vast 
majority of people who have these encounters 
survive, usually with only minimal injuries. 

Try to stay out of the aliens’ way. Let them 
take whatever they want. No one knows for 
sure what motivates them, but the prevailing 
theory is that they're on some kind of recrea- 
tional scavenger hunt, collecting a seemingly 
random assortment of items such as rubber 
bands, peanuts, crayons, lipstick, toenail clip- 

pings, blood samples 


> NATURE.COM and tears. 

Follow Futures: Don't fall into 
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A self-help guide. 


are going to do with these items. They're 
aliens, not witches. 

If you hear yowling or snarling com- 
ing from your bedroom, and you begin to 
regret confining your agitated cat and your 
terrified cat in the same room, undoing all 


the progress they've made in learning to get 
along together over the past three years — 
just ignore it. You need to focus on the aliens. 

Watch the ship and follow its lead. If it’s 
repeatedly bumping against an interior door, 
cabinet or drawer, it wants what’s inside, and 
your best course of action is to open the door 
or cabinet or drawer or whatever before the 
aliens get impatient and reduce your collec- 
tion of stemware to a puddle of molten glass, 
including those nice champagne flutes you 
were planning to use just as soon as you had 
something to celebrate. 

If you have a fire extinguisher, keep it to 
hand as you follow the aliens around the 
house. 

If the ship is hovering near your eyes, 
remove your contact lenses immediately and 
place them on the nearest table or countertop. 
Seriously. You don't want the aliens to extract 
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them directly. If the ship is hovering near any 
other part of your body, remain calm and let 
them harvest whatever biosample they need. 
Chances are, it'll be something small. 

Apply direct pressure to the incision 
site(s). If the bleeding continues for more 
than five minutes, call 911, and 
an ambulance will be dispatched 
to your location. Be sure to 
mention that aliens are present 
and stay on the line until they 
leave. The aliens will incinerate 
anyone attempting to enter or 
exit, so it’s important that the 
paramedics wait outside until 
the aliens are gone. 

On a related note, keep all 
your exterior doors and win- 
dows closed and locked. 

If the ship wants to enter the 
room in which you've confined 
your cats, you really have no 
choice but to let it in. Your cats 
should be terrified enough by 
now that they'll stay out of the 
aliens’ way, and as long as they 
do, they'll be fine. The aliens 
seem to be completely uninter- 
ested in cats, which is one way 
we know they are aliens. 

Eventually, the ship will 
hover near an exterior door or 
window. This means the aliens 
have collected everything they 
set out to collect, and all you need to do at 
this point is let them out. 

Congratulations! You’ve survived your 
first alien encounter. Extinguish any smoul- 
dering fires. If you've called 911, let them 
knowit's safe for the paramedics to come in. 
Give your cats some treats. Blot any blood- 
stained areas of carpet with a clean, dry 
cloth, then apply some club soda and blot 
some more. Text those friends you've been 
meaning to get together with, and pop over 
to the store to pick up a bottle of champagne. 

Just remember to close the door behind 
you on your way out. 


Laura Pearlman’ work has appeared 

in Shimmer, Flash Fiction Online, 
Intergalactic Medicine Show and a handful 
of other places. You can find her on Twitter 
at @laurasbadideas. 
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